Bridged Regional Mesh

I'm interested in tackling the bandwidth problem facing Principal Representatives (PRs) by applying network topologies that will reduce each PR's number of connected Peers.

Nano's Current Peer Network: Fully Connected Mesh

Currently Nano utilizes a Fully Connected Mesh in which all PRs are peered with each other PR. During an election, each PR broadcasts its vote to every other PR. Then other PRs sign their vote to the block and broadcast again. As the number of PRs grows, the amount of traffic seen by any one PR grows linearly. The bandwidth requirements for PR consensus are one of the largest hurdles for growing the network's CPS, so we should aim to improve on this topology.

Star Tree Network STN

My first thought was a Star Tree Network (STN); a network in which "Leaders" are chosen to lead each "Star" and convey election information to the other Leaders. Leaders could be selected by highest vote weight within the Star, and Stars could be formed by a relatively low-latency group of PRs which have a total vote weight of approximately Online Vote Weight divided by Desired Shard Count. Desired Shard Count could be something simple like square root of online PR count. Leaders communicate to other Leaders within a fully connected mesh (similar to PRs right now,) and PRs within Stars communicate election info only to their respective Leader. If a Leader goes offline or for whatever reason falls behind on voting, the next highest vote weight PR within the Star becomes the Leader.

Bridged Region Mesh BRM

Now I've worked out another network I'm calling a Bridged Region Mesh (BRM). The network is subdivided into Regions determined by latency and containing approximately equal numbers of PRs. Each Region is its own fully connected mesh. To connect all the regions, each PR additionally communicates with one other PR in each other Region (these are the Bridges.) This could be established by a PR broadcasting to the network for a partner, and accepting the first response from each Region. The problems facing BRM are determining region size or count, and establishing how PRs should choose which Region they should belong to.

Bandwidth

Let's do some math, and assume there are 100 online PRs. I will just discuss the highest traffic any one PR will see on the network, as that is the bottleneck. I likely misunderstand a lot about how elections work in Nano, so please correct my math and I will update this post.
In today's network, each PR would see 2 x 99 = 198 packets per election.
In STN, with 5 Stars, each Leader would have 2 x 4 + 2 x 19 = 46 packets per election.
In BRM, with 20 Regions, each PR would have 2 x 4 + 2 x 19 = 46 packets per election.

While I didn't try to min-max to get the best numbers for STN and BRM, we can see from this simple example that both STN and BRM would be a huge improvement over the current network, reducing bandwidth requirements by up to 75%

Latency

Currently, latency is as low as possible since all PRs communicate.
In STN, the latency within a star is assumed to be as low as possible, while the latency between Leaders is likely to be larger. Latency is also likely to be worsened due to the voting occurring in multiple steps: Star -> Leader -> Leader -> Star
In BRM, the latency within a region is low, but the time to complete a vote should be considerably lower than STN seeing as each region has multiple connections to each other region, and there are fewer steps to complete a vote.

Conclusion

BRM seems the best option I've come up with, as it avoids the problems associated with choosing a leader, while potentially reducing latency. It's also simpler to describe, which is always a plus in my book.

3 Likes

I really like this idea. Have you considered any attack vectors to your proposal? Could a Star be compromised if enough PR collude?

I have a limited understanding of how vote stapling utilizes musig and whether any aspects of it would have to be modified. Assuming musig requires no modification and protects from forged votes, changing the topology of the network should not open up collusion attack vectors.

However, the Star topology does create a reliance on having Leaders and may make DDoS attacks more dangerous. The Star would benefit from rapid health checks and replacing an unhealthy Leader ASAP.

Perhaps borrowing ideas from STP can help you with your design.