Consensus improvement draft

I've been doing research on documenting, isolating, and improving nano's consensus algorithm. The primary driver for this is to provide a formal specification that can be better analyzed and formally proven.

I've been making revisions, a reference implementation, and seeking out people with a background in this type of research with whom to collaborate. This is a highly specialized discipline and our search so far has not yet yielded anyone able to do an informed review despite targetted reach outs. Recent visibility around this area is why I'm putting this research out as-is, rather than in a completed state.

Attached is the latest copy of the draft and this is a link to the latest reference implementation.

I will continue to update the draft, specifically the parts around "slates" which have been expanded on in the reference implementation. In short, slates are replaced with a confidence metric that can be adjusted.

I've done initial testing on the confidence metric, which essentially amounts to the number of successive votes that need to be in agreement for confirmation to complete. Below is a table showing the number of successive votes and the associated failure rate in worst case byzantine behavior as tested so far:
1 vote - 0.3%
2 votes - 0.000002%
3 votes - (Unmeasurable)
The results so far are positive and show that with very few additional successive votes, the failure rate drops off exponentially.

There are a number of existing consensus algorithms on the market with varying degrees of research though many of them don't work in the way required for a platform that scales as nano does. The main categories existing algorithm fail for nano's use case are 1) algorithms that require leader selection i.e. Paxos. Paxos is probably the most robust algorithm on the market though it's designed for database clusters in organizations rather than an open scalable network. 2) algorithms that are not byzantine tolerant. Next to Paxos, Raft is in a similar market vertical and it is not designed to tolerate byzantine behavior. 3) algorithms that require interactive request/acknowledge from validators. As the number of non-validators in the network increases, the load these validators must handle increases to unsustainable levels and presents a denial of service attack simply by adding more non-validators to the network.

An ideal algorithm would provide consensus using network broadcasts which can be efficiently implemented using network topology techniques such as multicast or overlays that calculate efficient minimum spanning trees.

Since this thread is likely to have a lot of interaction, please keep comments on-topic and brief so the discussion doesn't get fragmented.

20 Likes

Very interesting. Could you please explain why is it needed to know the timeframe of the vote? I thought that non-byzantine validators never vote twice, is it incorrect somehow?

Multiple votes are needed in order to resolve conflicting proposals and obtain a quorum. The reason the vote duration is specified is so no one can interpret subsequent votes as overlapping. If a vote is at time 1000 and duration 50, they must wait until time 1050 in order to issue a new, conflicting vote.

2 Likes

I understand the proposition, just do not understand why would they need to publish a conflicting vote? Why not just wait until 65%-quorum is reached without reissuing any votes?

Are you considering any other alternatives that are leaderless Byzantine safe consensus algorithms?

Eg

1 Like

Do final votes reduce the need/benefits from this proposal? Since final votes essentially force a two vote minimum, and nodes won't confirm transactions if they don't see enough final votes?

EDIT:

Thinking about it some more, I guess I can still see how knowing the number of pulses up front and returning a no-value ∅ state for data loss (vs sticking with an older, possibly incorrect vote/pulse) could lead to more efficient voting, but it's hard for me to tell how much more efficient that is vs the current system with final votes

Thank you for your draft Colin, it seems that documentation really needs an update sometime. Anyway I am a bit surprised about how advanced the consensus in Nano is.
It is our advantage, compared to other cryptos, that a double-spend only affects the account-chain it is in, not the whole blockchain. For this reason, I am pretty sure we could implement a simpler consensus protocol than the voting system. Below is an example.

Protocol: As double-spends in Nano are always due to malicious intent or bad programming, we could close the user account when it happens.
In that case, the node that sees double-spend only remplaces transactions that refer to the same « previous » field by a closing block.
The closing block is a special block type (as was "open" in ancient times) containing the previous and block_height fields. It is not sent to the rest of the network but the transactions involved in the double-spend are sent as usual. This way, we don’t increase the network load and we get a simpler protocol than by voting.

This won't work. For instance, if someone creates a fork for a transaction they made 1 year ago, new network joiners will see a closed account while others will see a year's worth of confirmed transaction history.

That's a particularity in Nano that there is no block confirmation. But for me it is more a burden than an advantage. In my opinion, we should confirm a block after a duration of ~ 3 seconds (maximum transmission time + propagation time between two nano nodes). During this period, if there is a fork on an unconfirmed new block, we just replace all unconfirmed blocks by the closing block. When a new node initializes, only confirmed blocks are transmitted. This way, it is strictly impossible for an user to fork old confirmed transactions.

There is block confirmation, that's the voting process.

It sounds like you might be operating under the original whitepaper understanding where voting only occurs when a fork is detected. That hasn't been the case for years - nodes vote on all transactions up front now. Iirc, accidental forks are still possible if users broadcast transactions from multiple wallets ("Natrium didn't work, let me try Nault"), or if someone tries sending transactions while bootstrapping a new node without being fully synced. Official implementation guidance is already to explicitly check transactions for confirmation

1 Like

You are right, I was basing my analysis on the 2017 whitepaper. Anyway, I think we had better to resume this talk after the new draft will have been improved. After my second reading the main problems I've noted are:

  • it needs a "context" paragraph at the beginning, explaining in which situation voting is involved (each transaction or fork?), how blocks are confirmed and what is the maximum size of unconfirmed blocks in the system.
  • figure 3 is missing (that's a copy of fig.1)
  • paragraph 4 (Consensus) really needs a section where types are explained (is time slice an integer? In this case, how the integer is related to local time? Is a vote a structure containing hashes of unconfirmed transactions and a boolean? etc.).

If we get a good draft, community could propose exploits or improvements without necessarily analysing the source code.

4 Likes