Network representative quality analysis

While a rank of representatives is possibly not desired, at least recognizing outliers and bad performs across the top representatives is useful.

So far, the community has explored the idea of average vote latency as a quantitative score for representatives, but also categorizing into a qualitative measure of "Fast", "Slow", etc.

This measure does unfortunately come with several drawbacks:

  • It is affected by network latency, making it unfair. This can be solved if using many antennas across multiple locations

  • It is supposed to be a way to measure how representatives perform under load, however, it would require the measuring nodes to be able to keep up with spam perfectly

Despite the drawbacks it has been helpful in some cases through the implementation mynano.ninja, e.g., https://mynano.ninja/account/nanocenter.

The Nano Foundation doesn't consider it our place to be defining possible measures to score representatives, however our knowledge and experience with the protocol can help in finding a better approach.

We know of some current experimentation with scoring representatives by selectively requesting votes for blocks and seeing if the representative can keep up with the request rate. This can then be used to analyze how many confirmation requests were dropped because the representative could not keep up.

This can be done, for example, during a spam event, by measuring across all reps and all blocks, on average we should see that the slowest representatives perform consistently worse on this measure. Alternatively a change in the node could be made to repeatedly request confirmation even for confirmed blocks. Without any kind of rate-limiting per-channel this should be feasible.

Initial results using this measure on beta network during a spam event show that the slowest representatives do indeed have a larger percentage of dropped packets (per-channel tcp_write_drops).

Are there any thoughts on this approach? One related question is whether this would be compatible with the DHT in the future.

2 Likes

Interesting idea. Any idea how much network bandwidth and/or processing power the repeated confirmation requests take? If all nodes are requesting this from all their peers and/or PRs all the time, seems like it would add pretty quickly (especially under saturation scenarios)

During heavy times nodes handle millions of votes/bocks/requests, so some random, extra repeated vote requests shouldn't have a material impact, however it does need to be considered. I could see it as just an optional setting only used by nodes who want to have elevated tracking of rep strength too.

Perhaps the original creator of this idea can weight in @Dotcom :slight_smile:

Just a few thoughts.

In a decentralized payment network, there are many reasons one would have to run a node part time.

So, the question could be how resilient is the system to part time representatives and how to make it more resilient without further centralization ?

Should everyone set a list of possible representatives, which all possibly have the right to run part time, and use some kind of failover when a representative is absent?

One reason that made me stay away from Bitcoin and be interested in Nano is energy consumption. For this reason, I don't plan running my own server 24h/24 7d/7, nor renting a dedicated server, nor a VPS, as there is kind of a paradox with the idea of an ecological cryptocurrency.

I understand the need for quality in the network, but I feel there is a higher risk of concentration with cloud based nodes than with self-hosted ones that could be run part time (e.g. on a personal computer).

I assume dealing with part time availability is possibly very complex.

However a way for a node to communicate if it is a "part time node" or "full time node" and in the first case when it is "on" or "off" could be useful.

Another idea is a "Time to Live", so that a node could communicate "I am available for approximately 2 hours".

A further source of inspiration could be RAID 5 and RAID 6. I mean dealing with redundancy and fault tolerance (unavailability or a representative in our case).

Lastly, could there be some kind of machine learning about nodes availability (e.g. learn that a node is usually available during business hours in some country) ?

Cheers