- Full title: CryptOpt: Verified Compilation with Randomized Program Search for Cryptographic Primitives
- PDF: arXiv (local mirror)
- Authors: Joel Kuepper, Andres Erbsen, Jason Gross, Owen Conoly, Chuyue Sun, Samuel Tian, David Wu, Adam Chlipala, Chitchanok Chuengsatiansup, Daniel Genkin, Markus Wagner, Yuval Yarom
Cryptography is hard, high-performance one even more so: formal proof of assembly implementations is horrible to model, and code generation from formal proofs are hard to lower to high-performance assembly. The core idea of CryptOpt is to treat this as a black box combinatorial optimization problem, and bruteforce possible solutions in a smart way against an oracle.
- start from a known-correct implementation in fiat-crypto (a coq-powered high-level to low-level IR proven translator) low-level IR;
- lower it via a fuzzer-like machinery replacing/reordering operands
applying semantics-and-data-constrains-preserving transformations, which has an acceptable
search space because:
- it's straight-line no-aliasing constant-offset-pointers assembly;
- transformations can be templatised, eg.
add ≍ clc; adcx;
- lift the resulting x64 assembly to fiat-crypto low-level IR;
- use a custom e-graph based equivalence checker implemented as a mix between an SMT solver and a symbolic-execution engine;
- if the new implementation is correct, benchmark it against the current; fastest one, and keep it if it's outperforming it.
This approach has a couple of advantages:
- fuzzers are cheaper than highly specialised engineering time
- porting implementations to new hardware is simply a matter of running CryptOpt on it.
- by lifting the assembly to fiat-crypto low-level IR, there is no need to write complex formal proofs, since fiat-crypto is already taking care of those.
- controlling the mutations allows to ensure that the implementation stays side-channel free.
The main issue though, is that one needs to formally implement whatever algorithm to optimize in fiat-crypto, which is not that easy (and which the authors of the paper didn't do for libsecp256k1).
Implementation-wise, the author ran 200k mutations, with 20 initial candidates, over 18 Fiat IR primitives, taking between 20 and 40 CPU hours. Interestingly, since the equivalence-based verification is slow (between 0.1s and ~300s), it's only done once at the end. They found out that "optimization progress is roughly logarithmic in the number of mutations." CryptOpt generates code around 1.20 to 2.50 times faster than gcc/clang for the same fiat-crypto generated C code. It's not faster then OpenSSL (but offers formally verified correctness), but is faster than libsecp256k1.