Equihash as a new PoW algorithm

I intentionally concentrate on AES and not SHA acceleration because I believe that hardware manufacturers will have more incentive to optimize it in the future (for example having less and less clock cycles per AES instruction and more instruction parallelism). Encryption is something that is absolutely critical to do at high speed for server and client hardware (and the demand keeps growing at a possibly exponential rate). Hashing is not. Even if SHA is accelerated (like in many newer CPUs), I don't think it would compare with with the amount of engineering given to encryption, especially for server hardware.

I feel like the current Nano approach to provide specialzed services for PoW might actually be a good idea in the long run. But with greater adoption, it is might be eventually inevitable these services would need to start collecting fees. So transactions may still be free but PoW would be "rented" from server farms and payed for using Nanos. Even if using memory-hard PoW might prove to be more effective, it may still turn out to be preferrable to do on the server side, because most users wouldn't care to pay a couple of cents for the PoW service. Also PoW will become more expensive (in terms of CPU) when the network becomes more popular.

I understand using encryption doesn't "feel" right. It's not supposed to feel right! that's exactly why I'm suggesting to consider it. It doesn't feel right because professional cryptograpers are dedicated to proofs and peer-reviewed research, and asking them to use a Cipher as a hash would seem to them like a waste of time and opening up to unexpected "holes". But in practice, Nano only uses PoW to prevent spam, not to secure transactions, so having the possibility of some obscure theoretical attack might not matter at all in practice.

I do feel it would be better to consult with professional cryptographers for this (I'm not one of them).

1 Like

I think I need to clarify a bit where I'm coming from:

My reasoning is:

  1. PoW should ideally run as efficiently on CPUs as on GPUs and ASICs.

  2. I don't think that mobile devices would ever be able to compete with PCs and servers when it comes to PoW. Even if using the most effective methods, it would still be a battery drain and inconvenience to most users.

  3. I accept the fact that data centers provide better energy efficiency and cost-effective processing.

  4. I predict that mobile users would most likely prefer to pay (though very little, with Nanos) to run PoW on datacenters (the payments wouldn't be immediate, per-transaction as that wouldn't make sense, more likely per x transactions).

  5. I prefer that PoW services would not require GPUs on the server side. This would lower the barrier of entry to any commodity server hardware.

  6. Even better: it would be great that end-users could run PoW on their PCs or even browsers (in exchange for Nanos) and sell it.

  7. One promising approach I can think of that may satisfy these requirements is using encryption as a basis for PoW. There might be others though.

So it's mostly about preferring CPUs to GPUs, not GPUs to ASICs.

BTW: I've actually spent several hours trying to find benchmarks for AES on many types of devices and it turned out surprisingly hard to find up-to-date results. Especially on mobile. I've read research about GPU and FPGAs but most of it seemed out-of-date. There are also TLS acceleration cards but I couldn't find hard benchmarks.


Early on in research we were looking for a proof size that was no larger than the current one, so the lightweight proof is the main tradeoff. Thankfully Equihash options aren't significantly larger with options in the 20-30 bytes range. Other requirements seems to be fine, although an actual implementation will be needed to ultimately validate them.

We don't currently have an ETA for this implementation yet.

1 Like

Here's a possible one-way hash function defined using only the encryption operation (no need for key scheduling - this is an intentional design choice because I don't believe key scheduling instructions would be as optimized as encryption and decryption instructions):

let k1 and k2 be nothing-up-my-sleeve keys (arbitrary constants like "foo" and "bar").

The hash is defined as:
H(x) = E_k1(x) xor E_k2(x)

I would be highly grateful if you could forward this to cryptography experts that are patient enough to try to make constructive suggestions (and even suggest alternatives if they find faults).

If the issues with this are mainly theoretical, and for purposes of PoW don't matter (we're not signing or authenticating documents here) then problem might be solved. Just use this in place of a hash function and you've got hardware acceleration on virtually all devices (including mobile), and on some servers supporting TLS acceleration (like with Intel QuickAssist) the hashing may possibly approach ASIC performance.

At the very least, the selection of the hashing function doesn't need to be decided with the move to equihash. We can move to equihash and then change the internal hashing function for performance gains.

I took a look at the instruction groups, it looks like there are sha256 instructions that get between 10-15cycles/byte. https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/sha-256-implementations-paper.pdf. The AES instructions are 3-4 cycles per byte, which is faster than I expected, I don't know why hashing would be a more difficult operation than encryption. https://software.intel.com/content/dam/develop/public/us/en/documents/10tb-24-breakthrough-aes-performance-with-intel-aes-new-instructions-final-secure.pdf. Blake2 already costs 5-6 cycles per byte https://eprint.iacr.org/2013/322.pdf.

We should consider the gains / risk ratio. Using an established hashing function means we don't need to verify a custom method.

The construction I'm trying to produce is called a One-way compression function.

Here's a quote from Wikipedia:

In cryptography, a one-way compression function is a function that transforms two fixed-length inputs into a fixed-length output. The transformation is "one-way", meaning that it is difficult given a particular output to compute inputs which compress to that output.

One-way compression functions are for instance used in the Merkle–Damgård construction inside cryptographic hash functions.

One-way compression functions are often built from block ciphers. Some methods to turn any normal block cipher into a one-way compression function are Davies–Meyer , Matyas–Meyer–Oseas , Miyaguchi–Preneel (single-block-length compression functions) and MDC-2/Meyer–Schilling , MDC-4 , Hirose (double-block-length compression functions).

Most of these constructions (as far as I've checked) make use of key scheduling (they initialize a different key for each hash).

I'm trying to find one which doesn't (only encrypts/decrypts).

The special characteristic here is that hash collisions are not very important for PoW (There's no issue of, say, a document being faked because someone was able to find some other document that hashes to the same value).

It could be that the method I suggested works well but is slightly more prone to hash collisions than the above constructions (or not). That's what I'm trying to find out.

It is likely that someone already published (or at least researched) something of this sort (possibly more than 20 years ago).

An expert who is familiar with the research may be able to identify the suitable papers.

Honestly, most of the risk I'm seeing is just the embarrassment might be flawed in some way. So if you ask in my name ("Some random person on the interwebs sent me this. Is it good enough for PoW?) it may save us both the embarrassment.

As for the measures for HW acceleration. I think a reasonable metric would be simply the ratio of the performance of the best accelerated version to the best unaccelerated version of the hash. The instructions themselves are for one round, this means ciphers or hashes with different number of rounds can't be compared to each other (I'm not a hardware expert though).

The most comprehensive benchmark database I've found so far for various CPUs are the SiSoftware benchmarks which seem up-to-date. Top score for non-engineering sample CPU is 88 GByte/s - 2x AMD EPYC 7542 32-Core = 1.375GByte/s per core (I don't know which cipher is used but I assume it's AES-128).

Edit: AMD EPYC 7F72 24-Core gives 57.07GByte/s. That's 2.375 GByte/s per core. The stock clock speed is 3.70GHz (assuming it's not overclocked). That averages as 1.55 cycles per clock (I'm still not 100% sure which cipher is being used here though).

1 Like

I have opened a question on Cryptography StackExchange:

As well as Reddit r/cryptography:

1 Like

Based on an answer I received on Cryptography StackExchange (from a user who noticed my Reddit post) the construction I describe is called "XOR of independent permutations" (XORP) and is considered very secure.

The earliest cited paper is a 1998 paper called Luby-Rackoff Backwards: Increasing Security by Making Block Ciphers Non-invertible (which cites a 1988 paper called How to Construct Pseudorandom Permutations from Pseudorandom Functions by Luby and Rackoff that describes the inverse problem: PRF -> PRP).

The 2018 paper cited in the answer (containing several proofs that are a bit beyond my abilities) is called Full Indifferentiable Security of the XOR of Two or More Random Permutations Using the χ2
. It builds on several papers providing proofs for various cases where the construction is generalized to n XORs (apparently even and odd values of n seem to matter) and make a distinction between the case where the keys are public and when they are private (kept as a secret).

This construction has also been proven to be secure beyond the birthday bound (from the SE answer):

Notably, XORP is secure beyond the birthday bound. If the permutations are on n-bit strings, then an adversary who makes q oracle queries can distinguish XORP from a random oracle with probability at most q/2^n, instead of q^2/2^n.

I have to say I'm a bit surprised. I did not expect it to be a very strong hash.

I am also surprised this technique hasn't been proposed for PoW (as far as I know). Maybe it just isn't very well-known (there is no Wikipedia page as far as I know). Maybe because a 128 bit hash (when used with AES) is seen as slightly less "marketable" despite the fact it provides sufficient security for preimage attacks. My personal feeling is that this extra sense of security might be justifiable when the PoW is used for securing transactions, but I'm not sure about its value for preventing spam.

I've also made a long comment on the Reddit post that attempts to summarize the reasons I'm fighting so hard to have the option of AES-based hashing considered.


Regarding the underlying hash function, MeowHash, an AES-NI-based hash function, came up with a quick google search; I know nothing about cryptography, but I hope it can be helpful somehow.

1 Like

The Equihash report mentioned in our forum update a few weeks ago is now available for review: http://content.nano.org/ABDK-Nano-Equihash-Report.pdf. We were lucky to be able to work with the authors of the original Equihash paper, Alex Biryukov and Dmitry Khovratovich, to review some low k value configurations for potential use in the Nano protocol.

As previously noted the results of this research pointed at a configuration of k = 3 and n ~= 100 being secure enough and requiring enough memory to be worthwhile candidates to explore. The next steps involve validating the memory requirements we could be targeting based on the n values and getting started on implementing an efficient solver for this configuration. Any community members who are interested in assisting in any way are encouraged to give a shout here or in the #pow channel on our Discord server.


The report states that the current Nano proof size is 16 bytes — I thought the current proof size was only 8 bytes (ie. 16 hex characters), or is this referring to a different PoC algorithm?

You're right it is 8, I think that was an oversight in the document.

The Equihash algorithm can be rolled independent from a new Node version, am I right? If that is the case, is it planned to be rolled out during v22 lifetime?

Primarily we need a solver, I've been eyeing up SyCL as the language to do that in. We also need a patch to widen the field in the block.

No one's stood up to make those patches yet so it's unlikely any changes will be in V22.

1 Like


I am a C++ developer in my day job. Saw a Reddit post that mentioned needing one link

Been wanting to contribute code to a crypto for a long time. :slight_smile: Recently posted in the development discord channel and just read the white paper. Could I please have more information on this problem? Is it related to the PoW alg?


Yes, it's related to the new PoW algorithm. Colin selected Equihash as the best fit. I suggest you speak with @clemahieu for more info.

Yes, the thing I'm looking for right now is a GPU solver for the Equihash algorithm with a configuration of k = 3 and n ~= 100.

I want to know whether this can be implemented in SYCL as it looks to be easier to integrate in to c++17 code than separate OpenCL, CUDA, or C++ source files.

Adding this support can be done in a few steps:

  • Research the SYCL support matrix of hardware/os across SYCL versions. I see v2 was just released this year but I don't know if it's widely supported yet.
  • Integrate a basic kernel into the node and work out all the build plumbing there.
  • Implement a solver that works on the above parameters so we can get performance metrics on both CPUs and GPUs to quantify its solution rate and memory consumption.

Thank you for the information! May be a bit slow with these initial responses. Taking some time to get the dev environment building on my local machine... Do you have any recommendations on files/lines that I should take a look in to first for getting up to speed?


https://chat.nano.org/ People in the discord might also be a help

1 Like

Have you considered using Verifiable Delay Functions as PoW? I'm not sure how it compares to Equihash but apparently VDFs are hard to run in parallel and you don't gain much from trying to do so.

Edit: Seems like the proof is too large to be practical for Nano.

1 Like