Ledger pruning

Github issue: https://github.com/nanocurrency/nano-node/issues/1094

Ledger pruning is a mechanism to safely remove parts of ledger.

  • Current proposal is to allow optional pruning of ledger blocks down to confirmed frontier, frontier predecessor and pending blocks (send blocks to address 0 can be removed, since there can be no receive block generated from that address).
  • All mechanisms that involve pruning must not break the ledger.

Benefits: Reduce ledger size on disk and lower requirements for nodes joining the network.

4 Likes

I have a proposal that is related to this. I was thinking of creating another Github issue but maybe better to just leave my thought here since I don't really know if it's a valid approach.

I have been running a node stored on a RAM-disk for a few weeks with good results, for example, decreased bootstrapping time during the "checking of unchecked block phase". Problem is most people don't have enough RAM to do this. What about dividing the future Ledger into three database files instead of one. Then let the node owner distribute these files to the best available hardware, preferably via the config file. Support for different file sharing protols like NFS, SMB/CIFS.

  1. Latest most accessed blocks: Allow super fast access on NMVe drive or small RAM-disk.
  2. Normal pruned ledger: Store on SSD or HDD for medium speed but cheaper server storage.
  3. Full unpruned ledger history: Cheap cloud storage or NAS. Speed not very important.

Even if not running on a RAM disk this could probably improve the build-in file caching in Linux.

4 Likes

Note that this would not do much against ledger spam attacks, because those could simply not receive the sends they create. I also did the math, and I believe simply not receiving the sends is more PoW per disk space efficient than opening new accounts for each send, so increasing the open block proof of work would not make sense as part of such a proposal.

3 Likes

Well, sometimes we need check of random block existence (i.e. confirm_req for representatives or block processor check if block exists), then need to access all databases

Restructuring the chains so that there is a receive chain and a send chain with automatic receives would fix the issue of pending blocks. Automatic receives would also prevent the side issue of "spam attack" where you just send an unrealistic amount of pending blocks to a known account so they can't properly use the account on a wallet.

1 Like

Then there is question of proper ordering of receive. Now receiver decide order, with auto receiving you need some sort of validator to maintain order

Could that be done by ordering pending block hashes?

Can epoch blocks be pruned?

It is unlikely epoch blocks could be treated as prunable separate from other blocks as they are being inserted into each account chain, as opposed to kept separate from the chain. They are also used as specific upgrade markers, so if they were to be removed there would need to be another mechanism in place to establishing the transition points on chains (if pre-transition compatibility hasn't been completely deprecated and removed).

Wonder if that epoch number could be added to the block information so the frontier holds all the necessary information.

The issue with attempting that is that current epochs are signed by a pre-determined private key that is not the same private key used for the rest of the blocks on the chain (and it can't change balance, etc. only do limited upgrades). Since the existing frontier is signed by that other account, it cannot be updated by anyone other than the account owner in order to have epoch information as part of the signed payload. Being part of the signed payload I believe is necessary for bootstrapping purposes to prevent epoch related details from being forged.

I was thinking that the epoch blocks would still be sent out as normal (signed by other private key) but then the latest epoch block would be pulled in by the account owner in generation of subsequent blocks.

I guess the effect would be to enable trimming of previous epoch blocks to the frontier, but any sent out after the frontier could only be trimmed once the account owner had produced another block.

Ah, I see, that might actually work, although whether the extra complexity of doing it this way is worth it would need to be considered. A possible approach could be:

  1. Epoch blocks are distributed and if valid, automatically be made frontier blocks on the account (no receiving can be required for these because then issues with delaying upgrades become very problematic)
  2. Once the account owner attempts to generate another transaction, it uses the same previous hash the epoch block used
  3. When published, this block will only be validated if it is with the new format the epoch block defined
  4. Once confirmed it would replace the existing epoch block (all votes would have to favor non-epoch blocks in at least certain scenarios to avoid potential fork issues)

I don't believe we have the necessary details in the block itself to indicate which block version it is, but that was being explored alongside changes required for a new PoW format. If that is added, we would know which block version it is, but it could be tricky to determine which block in a chain was the first one of a particular version, which may or may not be required.

Given that a previously confirmed epoch block would be getting replaced in this scenario there may be some unique considerations with regards to bootstrapping, fork generation and resolution, etc. Additional disk checks may be required as well to determine whether a fork form a root was between a confirmed epoch or not - but this may already be done with any fork of an existing block, it would just mean there are more of these checks because it would be needed for each active account used post-epoch.

These are just some initial thoughts but this could be worth exploring further. Any other thoughts about the possibility of this to help avoid ledger bloat?

Thanks for fleshing it out.

Out of interest, where is the main concern with ledger bloat?

PoW changes should stem the problem growth in the grand scheme of things, but is the focus still on things like pending blocks?

Hi, any update on this topic?

Ledger Pruning is still in Planning phase with current release target of V23, but is subject to change depending on other developments, priorities and timelines.

Not because different nodes can see slightly different order especially during spam events

1 Like

It would be good to implement Ledger Pruning so that the node operator can set in config how many blocks of history the node should keep. Something like:
pruning: 0 (no pruning)
pruning: 1 (only the frontiers)
pruning: 1000000 (frontiers + the newest million)

Yes, I was thinking the same. A default set to β€˜all’ and i think the minimum set is frontier plus 1 so maybe the value set is for the additional blocks on top of the minimum set.

So:
-1 = no pruning
0 = full pruning
1 = frontier + predecessor + 1 additional predecessor
2 = frontier + predecessor + 2 additional predecessor

Hoping that this and new PoW or floor difficulty increase gets prioritised.

Out of interest, whats stopping everyone trimming the ledger and losing old block information such that its hard to bootstrap in future? Will we need to use node telemetry possibly to determine if people can all prune?

I was also recently thinking we will likely want to include pruning details in telemetry to watch the overall status of the network in that regard. Depending on how pruning is implemented, there is a risk of bootstrapping reliability dropping if a large portion of the network heavily pruned. This could potentially shift the need for bootstrapping from scratch (or deeper in the ledger) to off-chain solutions, which require extra trust, or be a force driving some additional centralization of this type of bootstrapping. This may be a natural outcome from taking this type of route.

Nodes can be configured to a certain degree to optimize for different types of activities, perhaps allowing the minimizing of the cost of keeping a full ledger available online for bootstrapping (while it does nothing else?), and other nodes are used to stay up-to-date and feed that bootstrap-only node with a ledger. Of course questions of incentive, complexity, etc. around a setup like this exist, but may be a reasonable tradeoff.

Without full details about how pruning will be implemented though it is hard to draw full conclusions on some of these questions, but they are definitely worth keeping top of mind. Any other ideas you've been thinking of that might be worth bouncing around?