There is currently no way for nodes to communicate metrics for monitoring network status. Things such as these are being considered:
Cemented block count
Set bandwidth cap
Protocol version number
Node vendor version
We are looking to see if there are any others node operator may want.
The reason for doing this, is that even though we are connected to many peers we don't actually share much information about the node state. It can there be difficult to know how far along the upgrade or bootstrap process we are or if an error has been encountered. This will enable the node to automatically adjust to these conditions.
We'd like the telemetry to be available over the regular peer to peer network protocol. However we are considering MQTT (or other message brokers) for our callback mechanisms, in addition to http and websockets, for their guaranteed delivery aspect
Now that we are adding voters count into into election information for websocket/RPC, I wonder if tracking and reporting to others the average voters per block seen over a time period could help give a better picture of the decentralization being actively seen across the network. Do we think there is value in that? https://github.com/nanocurrency/nano-node/pull/2414
Given feedback, we currently have 10 piece of telemetry data available:
node vendor version
genesis block hash
More can be added later in future node versions and it is also forwards/backwards compatible. Nodes running newer versions will not receive any newly added data which older nodes do not understand of course but the messages will still be valid. The node does not actually use the new data yet https://github.com/nanocurrency/nano-node/pull/2446 for anything useful, but does allow requests to specific nodes and a random selection through the new "node_telemetry" RPC. I anticipate this will be ready for DB4.
That's amazing! To make full use of this for NanoTicker, retrieve data from ALL nodes and completely remove the use of nodeMonitors I suggest adding:
Confirmation time (at least median for past 2048 blocks but would be nice with more stats like average, percentile 90 and 99). Can't send full history so some kind of averages are needed here.
Weight of the representative. Maybe not possible since you need to know which account is the rep but maybe the rep address and custom rep name could be optionally entered in the config and included in the response. Then the weight could be calculated manually from that. I know the weight can be optained from the "confirmation_quorum" RPC with "peer_details" and match the IP but is that reliable enough? A custom rep name would be nice regardless. I can't really use telemetry on nanoticker until the weight can be obtained because I separate PRs/Non Prs by weight.
Store vendor version
Other useful stats not currently on nanoticker
Average used data bandwidth in/out past min, or current bandwidth load. Or total value since the node restarted is probably better and the average can be calculated elsewhere as needed.
Current node load as a percentage of maximum possible CPU capacity. Or average over some time.
Current node memory use in MB. Or average over some time.
For some things we can use a moving average to stay O(1): A := A + (1/N)*a. Once the queue has N items, and the oldest item is z, it's simply A := A + (1/N)*a - (1/N)*z = A + (1/N)(a-z). Doing percentiles is harder since it would have to be on-demand, so only if we cache the telemetry response for some time.
Getting load and memory usage cross-platform is quite messy.