Proposal: A More Dynamic Active Election Container

This proposal is for making the active elections container more dynamic with regards to how and when it drops blocks. I believe the need for a more dynamic way of managing this container har arisen with the reduction in size and the removal of pow prioritization.

There are two parameters that comes into play here:

TSA: Time since block added to the active election container
VC: The current vote count of the election of the block

First, a quick review of how it is now: A block is added to the active election cointainer after being processed by the election scheduler. When a block is added here it request votes from other reps, and stays in this container for 5 minutes. If it has not reached enough votes go be confirmed after 5 minutes it is dropped and restarted after 30 min, or some other specific criteria which I won't go through now.

My proposal is to make this container more dynamic to avoid it being filled up with blocks during spam that will never be able to be confirmed because other representative nodes are not voting on the same blocks. For clarity the current size of the container is 5k

I suggest that a node will get triggered to start dropping blocks from the active elections container when it is 90% filled up. This to make sure it is rarely completely filled up, so it will allow new blocks to be added from the active elections container.

The criteria for dropping a block would be based on TSA and VC. If the container reaches 90% filled up, it will look up blocks that have been in the container for longer than 10-30 seconds, and drop ones with the lowest vote count. Perhaps there should be a threshold here, that if it is less than 10% from reaching quorum we do not drop it. A block would still be dropped if it has stayed longer than 5 minutes regardless of VC.

Perhaps also together with this we should reduce the wait time for it to being picked up again from 30 min to 5 min or even 2 minutes. The election scheduler should anyway be responsible for picking the block to be added again based on LRU and balance.

I have not considered how this might, if it does, affect bootstrapping, so any input on that would be nice.

Any feedback is welcome

1 Like

Instead of vote count, I'd use total weight, as that's more relevant to confirmation and avoids Sybil attacks.

2 Likes

Thanks, good input and agree.

I think if you do this but still have a bucket backlog of millions (all with similar LRU's) waiting to get into the container you're still overly swamped and going to end up with every node with different blocks in the container at any given time (with some nodes dropping blocks out as other nodes move those same blocks into play). Although it certainly improves the current situation to introduce some mechanism to cycle new blocks into the container.
I still believe you need a firmer way of knowing what blocks go into the container so nodes don't desync.

2 Likes

I agree that a mechanism is needed to make sure that the majority of blocks in the election container are the same over all nodes to ensure efficient confirmation. I suggest rather than using timeouts on the election scheduler for that, the election scheduler should interact with the election container in a dynamic way. Timeouts for outliers should still be in place, but the problem of low overlap between the content of the election containers of different nodes would only get solved very slowly with timeouts.

How about each bucket "keeps" blocks in its sorted table until they are confirmed (marked as "in_election")? Then during each RoundRobin, not only new blocks are put into the active election container, but also any block with the "in_election" tag that is found in the lower half (or lowest XX%) of the bucket table is instantly dropped from the active election container. This way the order within the buckets is transferred to the active election container, which should make its content very similar over all nodes and therefore elections within it very fast.

The problem, as I'm reducing it, is trying to churn through a near infinite supply of transactions with leaderless consensus.

Here's a thought experiment of the problem I see without leaders:
You and I have a deck of shuffled playing cards. If we can hold only 2 cards in each of our hands, and if you and I both hold any identical card at once, we can put those in the confirmed pile. We put any non-matching cards back at the bottom of our respective decks. We repeat this until the decks are gone. Our hands are the active election containers, the decks are some bucket being spammed. As you can imagine, this would take a very long time to play out; maybe it would end up in deadlock.

I would suggest adding a form of leadership: prioritize by total vote weight.

When an incoming election is received that this PR is not currently voting on, consider the total vote weight of that election. If the total vote weight exceeds that of any "old" election in this PR's active elections, then using OP's method, replace an old election having the lowest total vote weight.

Elections must continue being confirmed. PRs can't just keep rotating elections through the spammed bucket, because the number of transactions sent with a similar LRU could be immense. We must assume it is infinitely larger than the active elections container, and so PRs may never find agreement. That is why I suggest using vote weight as the leader.

Why not eliminate the 5k (or 50k) cap and allow all blocks meeting the following rule to be voted on:

Block is successor to previous block listed in ledger (and confirmed)

I do agree, some level of TTL should exists on blocks. However if all are adhering to rule above, then TTL breaches will be one of a slower performing network/node. In which case, eliminate blocks (and it's dependents from queue - given gaps) based on TTL value. Not sure what implications this has but it might remove the need to focus on queue uniformity between nodes.