Anti-Bloat Proposal: Age-Based Dust Removal

Hi all. Given the ongoing (presumed) ledger bloat attack, I submit the following for consideration:

Proposal
Discard dust addresses once their frontier block reaches an age threshold.

Background
Based on three principles:
1.The network must function indefinitely, irrespective of usage history.
2. Dust addresses of sufficient age will never be re-used and are thus already lost.
3. No harm is caused by discarding that which is already lost.

Definitions
Age: Measured by elapsed blocks or by timestamps. Asynchronous discarding acceptable given addresses are dormant.
Discard: Irreversibly prune from all copies of the ledger.
Dust Address: An address whose balance is below the dust threshold. Not to be confused with truncating of larger balances.

Practical Example
Addresses with less than 100 raw are discarded from the ledger after ~5 years of inactivity.

Advantages
-Existing microtransaction use-cases not affected.
-Rate of discarding can be proportional to rate of new blocks.
-Negligible effect on circulating supply (effects are proportional to cost of attack)
-Retroactive

Disadvantages
-Controversial. Discussion required.

Challenges
-Determining how much is 'Dust' and how old is "Old".
-Fiat value changes with time. Important to ensure that only values with negligible probability of recovery are affected.

Strategies to Address Challenges
Balances discarded must be so small and so old that there is negligible likelihood a user would have 'bothered' to keep the keys. Some ways to arrive at acceptable values include:
-Model the relationship between address age, balance, and frequency of re-use.
-Backtest values on existing ledger and/or simulations of higher network use / rapid fiat value appreciation.
-Simulate effects of relational thresholds (ie: a function of balance and age, up to set max values)

Conclusion
The ledger need not be small, but it must be finite. Thank you for reading, just trying to do my part.

5 Likes

I'm generally sympathetic to the idea. I think the disadvantage is that it could be controversial.

I think working on a design for such a system is worthwhile.

4 Likes

I really like this idea.
However, a spammer could keep the dust addresses "alive" by sending one more raw to the same addresses every few years. It doesn't even have to be the same spammer who does this. Just send one raw to unopened accounts and the "time to hardprune" will be reset.
I believe this can be fixed if we only "hardprune" unopened accounts with pending dust blocks where the first block is of a certain age.
I also think this makes it less controversial. If you leave dust pending on an unopened account for years it will be hardpruned. If you want to keep the dust you just need to sign a receive block

3 Likes

One thing I'd like to see in the design is no "magic numbers". Similar to how bandwidth_limit limits bandwidth on a per-PR basis and the network CPS is implied by these settings, some per-PR configuration that isn't hardcoded in to the node code should be the goal.

1 Like

Fair point. Post updated.

I could see two types of detractors:
1.Those who accept the concept but reject any specific thresholds.
2.Those who oppose the concept on principle.

Corresponding solutions could include:
1.Open collaboration and peer review of proposed thresholds
2.Discarding can be optional. Staunch supporters of a complete archive could still choose to maintain one, it just wouldn't be required for the network to function.

I could see a future with "dust archives" maintained on cheaper but slower memory. Such archives could be accessible by active nodes by request. In fact, this separation is how most of the worlds data is already stored/archived.

4 Likes

I do like the split-storage idea also, and it might be easier to implement than removing dust accounts.

4 Likes

Local Thresholds
Good point regarding magic numbers. What if PR's choose their own local thresholds? It would be dynamic and remain true to the spirit of ORV. For example, if 51% of PR's (by weight) have set their dust threshold to <100raw and >3yrs, then the network has spoken; larger/newer blocks would be saved and the remainder considered disposable dust. (In practice this would occur on a block-by-block basis, since a given block would only be lost once 51% of PR's have discarded it.)

Effects
1.It would introduce additional metrics by which holders can choose their representative.
2.It would give small PR's suffering from bloat an alternative to dropping out.
3.PR's could scale their thresholds with changing fiat values.
4.Dissuades bloat attacks since no guarantee any specific dust value would 'stick'

Implementation
The feature could be released with default values of minimum 1 raw balance and infinite age, allowing PR's to consciously implement and configure this feature as needed.

3 Likes

Additional Considerations for Local Thresholds
1.Dust recovery is somewhat possible, since 'discarders' could bootstrap dust back from 'archivers' using the existing bootstrapping framework.
2.Each PR could choose step-like criteria (ie: 1raw@1yr, 100raw@5yrs, 1000raw@10yrs) as they see fit.
3.This could be surprisingly straightforward to implement given that consensus is not required. Since PR's discard data locally, a given dust block would only be affected once 51% of PR's (by weight) have discarded their copy. In other words, the 'network' doesn't have to 'agree' on anything. The effect on a dust account is just the natural outcome of 51% of PR's being unable to verify a new block.

Ironically, this has been under our noses all along, since nodes have always been able to discard data from their copy of the ledger.

Disadvantages
-Best implemented at higher Nakamoto Coefficients to improve likelihood of secure dust recovery.
-Small accounts (near dust thresholds) under-represented in ORV.
-Maybe others; appreciate more feedback.

2 Likes

Maybe I'm missing something....

But if the account will be "removed" after 5 years, if it only have < 100 RAW.... In order to keep it alive we only need to "refresh" that account every year?

So, I could send 1 RAW today. Then, I wait ~1 year and release a new publish, with a new block. :thinking:

But what happens with that account be alive again? A new open will be allowed and what prevents the "fork" between the old "Open-Block" and the new one, since not all nodes have such information?

Another issues comes when some node holds all the information from the account (incluing all Open/Receive/Send) and a new block is published after ~10 years. If that block have the previous that is already "deleted", only one node can "recover" all the information.

The bootstrap can also be affected. If the node tracks the account + frontier of each account, in order to only accept future blocks, how the Confirm_Req will work against blocks that they don't have anymore?

1 Like

Yes

But then the old transactions can be pruned

But if "old transactions can be pruned", how do you know if the transaction isn't "pending" anymore?

Consider the following:
Account A: Open [1 Nano] -> Send 1 RAW to "Account B"
Account B: Open [1 RAW] -> Send 1 RAW to "Account C"
Account C: Open

So, "Account B" and "Account C" are "Dust addresses". Supposing that Account B and Account C get's "Discarded".

What happens if Account B publish the following (5 years later):
Account B: Open/Receive [1 RAW from Account A] -> Send 1 RAW to "Account D"

The "Account B" was "discarted", so it can re-open the account and send it again, to another account. How we know that the "Send" from Account A is already claimed before?

2 Likes

I support the idea of archival nodes. Archival notes would keep track of every balance that they see as they passively watch the network. They would not function as principal representatives, but simply function as a resource for nodes to reach out to.

The implementation of archival nodes would allow principal representatives and other nodes on the network to self-select what transactions to keep in their own storage, versus what transactions to prune off. The parameters for self-pruning would be defined by each node, based off of its available storage and likelihood of needing access to that record.

This will allow for principal representative nodes to be far more lightweight with regard to storage. in an event that they observe a transaction from an account that they do not have a record of, they would be able to reach out to archival nodes and incorporate that data into their vote.

Potential issues with this is that a large number of transactions are injected into the network from accounts that have not experienced activity over an extended period of time, requiring multiple PRs to all simultaneously query archival nodes. This could result in delays in transaction speed, comparatively to normal operations.

Additionally, DDOS attacks against the archival nodes themselves could create challenges, especially if those attacks were concurrent with multiple transactions against pruned accounts.

However, intuitively, my first reaction is that the benefits of allowing nodes to become more lightweight would outweigh any potential attack vectors or their impacts on the network.

2 Likes

Could be nice to calculate the potential MB savings from the current ledger for a 100 RAW threshold? If its like 1% savings, is it worth it?

1 Like

Refreshing Dust
This would require an ongoing effort by the attacker, approximately equal to creating new dust. Spam prevention mechanisms are better suited for mitigating persistant attacks.

Archive Nodes
Even if 49% of PR's 'know' that a block/account exists, so long as 51% have no record of it, it effectively never happened. It may not be feasible to trustlessly recover data from a minority of 'archive nodes'. Perhaps we should distinguish between a block/account being discarded (locally from one node) versus being forgotten by the network (discarded by 51% of PR's).

Bootstrapping
Currently when a node connects to the network after spending time offline, it must bootstrap the blocks that it missed while away. The same mechanism applies to dust recovery, except the blocks missing are old ones rather than new. In practice this would occur only when a node has lowered its dust thresholds. This would be rather unfortunate for the node in question, since it would have to compare its entire ledger to its peers in order to find the missing data, effectively bootstrapping from scratch.

Pending Blocks
Most spammers do not receive their dust, so the utility of this proposal hinges on the feasibility of discarding unreceived spends. In this example, Account B's second receive would be rejected since the corresponding spend from Account A has already been forgotten. It is however a very interesting thought experiment, since attempts could be made to "weave" through the various balance and age thresholds between PR's to attempt double-spends. Proving or disproving the existence of such a vulnerability would require further analysis and/or testing. Great catch, this is a real puzzle.

I am not a developer. Does anyone know if there are technical roadblocks to discarding/forgetting unreceived spends? For instance, if Account A's unreceived spend is forgotten, would it affect the validity of Account A's subsequent non-dust blocks, including its frontier block? Is this kind of continuity critical for protocol functions, such as cryptographic proofs?

Dust Statistics
100 raw is only an example, however 93.4% of the ledger is made up of accounts with a balance of <0.01 NANO, and their combined value is just 0.0003% of the circulating supply. Source: https://nano-faucet.org/rich-list/

3 Likes

A block is never forgotten. The "pending-block" is one block on the "sender"-chain and will remain there. If it gets removed (from all the network), you can't reach the genesis anymore. Think about getting the frontier block of that account and follow the "previous"/"link" until it reaches the "genesis" (the first block ever).

For pruned nodes, you must keep some information about the pending-block (such as the receiver public-key, the amount, the hash of the block). But, you don't need to store the block itself (which includes the signature, representative...).

That creates a new layer of attack, which isn't about the ledger-size, but about the network usage. That feature of "getting blocks on-the-fly" is what "light-nodes" (light-wallets) relies on. Such as the old Nanollet. In that case, the light-node request other nodes, on the nano-network, blocks from some account, votes and so on. That is fine for that propose (since the light-node will only track a few addresses). However, that is terrible on a larger scale.

Also consider that ANY invalid block could be a potential valid block. I mean, since you are deleting (without any metadata), you can't distinguish between an invalid from a "valid-deleted" block.

2 Likes

There are solutions to the aforementioned recovery problems. My point was moreso that recovery is inconvenient and unlikely to be exercised, especially given the age and low value of affected accounts. And yes, there would be security challenges. To be clear, I support a non-recoverable approach for precisely these reasons, but I do sympathize with the desire for backups.

It is conceptually reassuring to maintain an audit trail back to genesis, but since the implementation of state blocks in v11, I'm not sure it is strictly necessary. Once quorum is reached and a state block is confirmed, it is immutable. There are no mechanisms through which the loss, discovery, or forgery of a previous block could affect the balance of a confirmed state block. If pruned nodes store only frontier (state) blocks and pending blocks, then what consequences could arise from selectively discarding the latter?

I was asking about continuity moreso in the context of signatures/proofs, but I suspect these too could be adapted as required.

Note: While state blocks are not affected by past blocks, a pending block could be affected by a discarded state block. For example, if Account A starts with 1N and sends 0.999N to Account B, Account A's remaining 0.001N 'dust' state block would have to be retained for Account B to receive the non-dust spend. It's not a big deal, the discarding mechanism would just have to check for non-dust unreceived spends before discarding a given state block. I don't think this helps attackers though, since they would have to distribute non-dust spends, thereby increasing the cost of their attack.

2 Likes

maybe it could be less controversional to instead of entirely removing those accounts just compressing them to the absolute minimum needed to work (aka account, balance ( which could possibly even be stored with a lot less bits than normal considering it's dust only) and frontier hash), that way you wouldnt need to keep the entire state of the frontier and you can also exclude those accounts from rep calculations, thereby reducing the burden on it all.

4 Likes

Personally, I don't think automatically "removing" accounts is a good idea. What for some is dust, for others might be a meal or more.

What I'd suggest instead is if for example every year, or every five years, or something, PRs would issue a warning to all account holders, and tell them that all accounts with less than X NANO that haven't been used for more than Y years would be removed from the ledger. This way, people would know they'd have to check their accounts and move their dust amounts to other accounts if they needed to. This would be viewed as a kind of "doing your taxes" or voting, as in people would have to do it every once in a while.

The reason for this suggestion is that if it's automatic, people will lose money. They won't remember to move their dust, and a few years later when they want it they will complain to the network for automatically removing their money. By making this a "duty" every once in a while, the burden to keep their funds safer falls on the users instead. And if they don't move their money it's because they ignored the warnings.

PS: I imagine a future where Nano is widely used, and when I say PRs would issue a "warning" I meant it would show up on the news everywhere, society would know, like it knows when it's "tax season".