Fuzzing python in Python, and doing it fast
Sat 29 February 2020 — download

My previously blogpost about python sparked a lot of fuzzing-related questions and discussions, so here is a blogpost featuring bits of my journey to fuzz python libraries.

I started to fuzz Python's native code parts at work, and was wondering how hard it would be to fuzz at a higher-level, since the coverage information I got was too low-level to be efficient for pure Python code.

Since I'm a grown-up, instead of rushing into writing my own fuzzer, I spent some time browsing the web to see if anyone else had open-sourced something that I could contribute to. And of course, I wasn't the first one who wanted to fuzz python: Yevgeny Pats from fuzzit released pythonfuzz in October 2019:

PythonFuzz is a port of fuzzitdev/jsfuzz

which is in turn heavily based on go-fuzz originally developed by Dmitry Vyukov's. Which is in turn heavily based on Michal Zalewski AFL.

The code looked kinda nice and compact. After reading it, the first thing I did was to profile it using cProfile along with snakeviz, on 100.000 iterations of a simple json fuzzer, and the results weren't pretty:

Original benchmark

Pythonfuzz is using coverage.py, the de facto tool to gather coverage information of Python programs. Unfortunately, it's super-slow. One of the reason for this is that it's repeatedly calling an abs_file function to get the absolute path of files, without any caching, resulting in a lot of syscalls, thus murdering the performances.

As soon as I added caching with @functools.lru_cache above the function, pythonfuzz gained +50% in performances:

I tried to upstream this change, but unfortunately the coverage.py version used by pythonfuzz was a bit old (4.5.4), and the latest one underwent a complete rewrite of its data storage backend, from in-memory to a sqlite database, making my change obsolete.

So I bumped to the latest available version, and the performances dropped significantly:

Benchmark with the latest coverage.py

I sent a pull-request to get +30% performances when connecting to the database, but most of the fuzzing time was still spent moving data in and out of the sqlite.

Benchmark with improved latest coverage.py

So I completely removed coverage.py, and implemented my own tracer. While coverage.py's one is written in C and is thus likely yielding a fuckton of raw performances, it's left dead in the water by a couple of lines of Python by a factor four! Turns out having a lot of features, knobs and a full database as backend can (and does) completely obliterate performances.

An other minor optimization was to use recv_byte and send_byte instead of recv/send where possible, since the later uses pickles to serialize data. This yields a bit less than 10% on my local benchmark on the fuzzer's side, by saving a call to pickle's serialize/unserialize per cycle both on the fuzzee and fuzzer's side.

There is still a lot of room for improvements, but it's now spending around 16% of the time in the fuzzed function, instead of the initial ~3%:

Final benchmark

Raw performances are nice, but what about crashes? Well, I've found a couple of bugs in Mutagen in a matter of minutes:

I have more cool patches for pythonfuzz locally (performance improvements, opcode-based libdislocator equivalent, structure-aware mutators, multi-factor guidance, …), but since the project isn't super-active and I'm still waiting for a couple of my pull-requests to be merged/rejected, I'll keep them private for now. I might fork the project at some point, we'll see.