Ledger pruning

Yes, I was thinking the same. A default set to ‘all’ and i think the minimum set is frontier plus 1 so maybe the value set is for the additional blocks on top of the minimum set.

So:
-1 = no pruning
0 = full pruning
1 = frontier + predecessor + 1 additional predecessor
2 = frontier + predecessor + 2 additional predecessor

Hoping that this and new PoW or floor difficulty increase gets prioritised.

Out of interest, whats stopping everyone trimming the ledger and losing old block information such that its hard to bootstrap in future? Will we need to use node telemetry possibly to determine if people can all prune?

I was also recently thinking we will likely want to include pruning details in telemetry to watch the overall status of the network in that regard. Depending on how pruning is implemented, there is a risk of bootstrapping reliability dropping if a large portion of the network heavily pruned. This could potentially shift the need for bootstrapping from scratch (or deeper in the ledger) to off-chain solutions, which require extra trust, or be a force driving some additional centralization of this type of bootstrapping. This may be a natural outcome from taking this type of route.

Nodes can be configured to a certain degree to optimize for different types of activities, perhaps allowing the minimizing of the cost of keeping a full ledger available online for bootstrapping (while it does nothing else?), and other nodes are used to stay up-to-date and feed that bootstrap-only node with a ledger. Of course questions of incentive, complexity, etc. around a setup like this exist, but may be a reasonable tradeoff.

Without full details about how pruning will be implemented though it is hard to draw full conclusions on some of these questions, but they are definitely worth keeping top of mind. Any other ideas you've been thinking of that might be worth bouncing around?

A few not so useful ideas that might lead somewhere.

1, DB technologies like MongoDB have primary and secondaries that stay in sync and you can configure secondaries to read only for other activities e.g. BI data extraction. I mention this in case there is any similar functionality in the db techs we consider. Similar to
what you say, a read only could be used for things like boot-strapping on cheaper hardware if such syncing were possible.

2, PRs may wish to have more control over how much and maybe even the nature of their bootstrap? e.g. block by block or snapshot. If we want them to focus on performance, we should shift bootstrapping culture from them onto the non PR nodes to give non PR nodes a more prominent role in the network.

3, We should probably survey how much people feel pruning will be needed now and going forward. Of course the urgency for this reduces a lot if we can increase floor difficulty x10 or so.

4, Holding a full historic node would be an incentive/marketing item for services, in future they could even charge for it (querying against older blocks). Would we need to track in telemetry how the ledger was created e.g. snapshot vs bootstrap?

Not sure how you would achieve point 4 but it would be very valuable if it’s not able to be spoofed.

Still wonder whether a UTREEXO type hash based accumulator would be possible with Nano https://dci.mit.edu/utreexo

I would suggest that Principal Representatives should keep the full ledger. In other words, no pruned node can be elected to be Principal Representatives, or the node should automatically bootstrap itself fully when the delegation threshold is achieved. Whatever is easier to implement.

How would you know if a PR is running a full ledger synced from genesis?

Good question. I didn't think of that...

One way would be a cumulative hash as described above.

Is that hash based accumulator similar to recursive proofs such as https://eprint.iacr.org/2019/1021.pdf?

Why does it matter where you got the ledger from if the current state is valid?

Imo there is no such thing as trustless bootstrapping, so you might as well ask PRs for the valid frontiers (pruned ledger/current state) up front

This is likely the direction things will need to go, but I think it's worth exploring preservation of blocks in relation to the genesis. If proofs back to genesis can be encapsulated for transactions, this may help pruning of pending in certain cases with the expectation that it could be restored later, perhaps at extra cost due to increased computation and validation efforts needed to do so.

I am pretty ignorant of the limitations here though, so am just spitballing a bit.

You might be able to get an idea of some of the limitations by listening to Tadje interviewed here. https://dci.mit.edu/research/2019/7/9/podcast-dcis-tadge-dryja-was-interviewed-for-whatbitcoindid-on-utreexo

It’s basically bandwidth which is an issue for Bitcoin, but not for Nano as Nano already assumes higher bandwidth requirements for validators.

1 Like

Another level of trust minimisation for a distant future.
Even simply exploring the validity of this approach for an account-based protocol (UTREEXO is only for UTXO based protocols) like Nano would be an answer to some criticisms around Nano’s trust model as it scales.
It may be able to reduce the risk of bootstrap poisoning if I understand it correctly.

We could maintain access to the full ledger still in Json suggestion, no? It would just be slower and less efficient.

I always thought that ledger bloating would not be such a big deal since storage space is considerably cheap, so if we could separate the performance sensitive part of the DB (headers + N blocks) from the non performance part (historical part, distance from header > N) we would be good. Of course this on itself does not prevent someone from attacking with the goal of bloating the performance sensitive part of the ledger, but we can deal with that some other way if we go this direction.

This is just an observation and even if correct it won't make much of an impact in the long term pruning discussion but is there any reason we keep the fields "work" and "signature" for non-frontier blocks? It seems to me they carry no value at all, if the frontier is confirmed I can see no reason to keep them, be it historical or for validating purposes. They feel like temporary variables that are leftovers and could be removed without any loss.

If that is the case, I would not even consider removing them as part of pruning, just DB clean up.

If someone is bootstrapping from you, they'll need the full block.

A bootstrap starts by requesting a list of frontiers and asking PR to validate it (using the same method available today, trust in the list from the downloaded software). Then knowing that frontiers are valid, they can guarantee that the every block in the hash chain was validated at some point, which in turn guarantees that the signature and PoW were valid, even if he never sees them.

1 Like

True, ledger stratification is another option for nodes that want full ledger, but limited in fastest storage size

1 Like

Secure fountain codes, a class of erasure codes, enables any full node to encode validated blocks into a small number of coded blocks, thereby reducing its storage costs. For Bitcoin they have demonstrated full nodes that encode the 191GB Bitcoin blockchain into 195MB (1000x storage savings). Secure fountain codes can achieve a near optimal trade-off between the storage savings per node and the bootstrap cost in terms of the number of (honest) storage-constrained nodes a new node needs to contact to recover the entire blockchain. A key technical innovation in SeF codes is to make fountain codes secure against adversarial nodes that can provide maliciously formed coded blocks. Please have a look at the very informative presentation and paper.

presentation starting from 4:07:33


Once Pruning is implemented is there any point of keeping the full history on the network itself? If nano's aim is to do 1 thing well maybe we should avoid have multiple types of history of nodes and just shared the pruned history. When pruning is being rolled out Nodes which care about history could keep track of the history themselves, similar to how nodes that care about timestamps need to keep track of locally themselves.

In relation to pruning receive blocks. I don't know the exact technical implementation of the network and how how key signing works, so I might say something stupid, but here's an idea. Maybe all the pending blocks could be pruned separately to all the received blocks. So each account would have 1 block of the current received balance and 1 block of all the pending balance. Then when the user is online they can use their key to sign all the pending balance into their account.