Automated Network Upgrades

This thread is to discuss improvements to upgrading/adding new features to the network, which have been implemented using epoch and canary blocks as well as phased upgrades and hardcoded dates.

Each of these have the drawbacks and benefits. summarized here

In my talks with various people as well there has been a concern brought up that the more automated something becomes the more it relies on centralization to support itself as well.

This topic though should slot in nicely with any discussion on node telemetry as what upgrades have been performed could be useful in determining when the network has reached quorum in regards to a features usability

It seems like there would need to be a few different mechanisms to help prevent manipulation of any automatic upgrade processes. Some features require the ability to interpret new messages but not send them until a certain amount of the network is upgraded (vote-by-hash for instance). Others benefit from having definitive points in account-chains to switch over (state block versions for example). As mentioned above these are covered in our docs here. This means various flavors of automated upgrades may be needed.

Below are some considerations for any automated approach, a bit of a random dump:

  • Would need to be triggered based on non-forgeable data points - primarily upgraded voting weight?
  • How can you tell a node is really upgraded? Proper response to version-specific messages?
  • Given the difference with server clocks and to avoid gaming the upgrade process, may need a buffer between when the network appears ready to enable to feature vs. actual turning it on - this could help prevent someone from forcing early upgrade to put pressure on services that haven't yet upgraded
  • If using voting weights, may need to monitor the amount in that version to make sure it is stable above a threshold for a certain amount of time before considering the network ready
  • Could you use a voting-weight based trigger and then a set time in the future after the trigger is hit to accomplish this?
  • Nodes that haven't seen proper voting weight despite there being enough upgraded nodes would need to be able to activate the feature once enough voting weight began signaling its use/using it directly (is this really a concern?) - this could require caching attempts at using the new feature for short periods if upgrade hasn't been observed
  • How do bootstrapping nodes know whether a particular feature can be used or not while they are pulling in the ledger from scratch?