Around 10 years ago, the 7th of June 2011, the first commit of MAT, the metadata anonymisation toolkit, was published. This was done as part as my Google Summer of Code, for the Tails project, under the umbrella of Tor and the EFF. My mentor was Mike Perry, but since he was super-duper-busy, intrigeri was my de facto mentor. I fondly remember writing the design proposal in completely broken English.
intrigeri insisted that I used test driven development, and it was a tremendously good idea: I spent some days cleaning up files that I wanted MAT to support, by hand. With those, I wrote a comprehensive testsuite: given this file with metadata, I want to end with this cleaned file. All that remained was to write the code, with the assurance that it was working.
During the development, I realised that there is no way to properly remove metadata from pdf files with exiftool, so I spent quite some time implementing a custom proper™ way instead: rendering the PDF on a cairo surface, save the surface as an image, and finally print it on a PDF. Unfortunately, this trashes the accessibility, so a "lightweight" cleaning mode was added: it might not remove all the metadata, but it doesn't trash the file as much as the regular cleaning mode.
I also wrote a terrible user-interface for it, in PyGTK. Localization was challenging, not only because the tools are obnoxious, but also due to the fact that it's non-trivial to differentiate between "cleaned", "cleaning", "clean", "maybe clean", "looks clean but can't know for sure", … Anyway, everyone hated the GUI, except two blind users who found it wonderful and usable, go figure.
One of the most demanded feature was to support Microsoft Office documents. This was challenging not only because it's an intricate format for which I don't have the official™ software to experiment with, but also algorithmically, since I went with inheritance instead of composition, so I had to massage a bit some classes into looking pretty.
A classmate of mine was kind enough to draw a logo:
In October 2016, I took a break from MAT, due to health issues.
intrigeri was kind enough to write some patches to mitigate CVE-2017-9149, to prevent the Nautilus extension from silently failing to clean some files.
Instead of writing a GUI that everyone hates, on the good advices of intrigeri, I went with writing a Nautilus extension, to have mat2 available via right-clic in the file-explorer. atenart and I (mostly him actually) managed to get it done in a couple of days. Miguel A. Marco-Buzunariz wrote one for Dolphin as well.
Sometimes around 2018, boum.org wanted to reduce the number of its services, so the mailing list smoothly moved to autistici.
In 2019, during a nice holidays, intrigeri spent some time implementing a
sandbox using bubblewrap for
subprocesses run by mat2, like
exiftool. Unfortunately, it's a bit brittle, due
to the amount of moving piece involved, and their lack of maturity.
mat2 is using all the modern python development shenanigans:
- Static analysis via lgtm.com
- Close to 100% tests coverage
- continuous integration running the testsuite on Debian, Fedora, Archlinux and Gentoo, on every commit, and once a week.
- Linting via bandit, pyflakes and pylint
- Systematic type-annotation, verified by mypy
I've been happily maintaining it to this day, mostly improving thoroughness when cleaning already supported fileformat and polishing rough edges. I'm glad that mat2 is now a rock solid boring software.
Some friends are taking care of packing mat2 in their favourite distributions:
- georg has been my downstream for Debian, handling not only the packaging, but also helping with bug reports, backports, continuous-integration, and various small bug fixes. Speaking of Debian, mat2 is now more popular than MAT ever was, with at least ~400 installations. He also helped a lot with the continuous-integration setup.
- atenart, a good friend of mine, was kind enough to package it for Fedora, and help with minor patches.
- kpcyrd, a #websec fellow, is taking good care of the archlinux packaging, with a bit more than 150 installations.
Others have done the same for various distro/platforms:
Unfortunately, nobody ported it to Windows yet.
Graphical user interfaces
People have been kind enough to write graphical user interfaces for mat2:
jfriedli took over the development of mat2-web and its interface (since they're good at web while I'm clearly not), and neat collectives like immerda and systemli are running instances! Open source design did some design work, but I don't think that it was really used in the end.
Conclusion and thanks
Ten years already. I never thought that it would have been such an interesting journey when I saw the GSoC webpage. I met a ton of amazing people along the way, new friends, learned a ton of things, both about software development but also about social skills, human interaction, project management. Most of the involved/relevant people do like their lofty anonymity, so they won't be named here, but you know who you are, and I'll be happy to buy you a drink next time we meet, to celebrate both the past 10 years of MAT/mat2, but also to its next ten years!