Artificial truth

archives | latest | homepage | atom/rss/twitter

The more you see, the less you believe.

Ten years of MAT
Mon 01 November 2021 — download

Around 10 years ago, the 7th of June 2011, the first commit of MAT, the metadata anonymisation toolkit, was published. This was done as part as my Google Summer of Code, for the Tails project, under the umbrella of Tor and the EFF. My mentor was Mike Perry, but since he was super-duper-busy, intrigeri was my de facto mentor. I fondly remember writing the design proposal in completely broken English.

Written in Python 2.5, to support Debian squeeze, MAT was mostly a wrapper around hachoir at the time, and steadily supported more and more fileformats thanks to projects like exiftool and mutagen.

intrigeri insisted that I used test driven development, and it was a tremendously good idea: I spent some days cleaning up files that I wanted MAT to support, by hand. With those, I wrote a comprehensive testsuite: given this file with metadata, I want to end with this cleaned file. All that remained was to write the code, with the assurance that it was working.

MAT's website, along with its mailing list, was initially hosted by the kind people of boum.org.

During the development, I realised that there is no way to properly remove metadata from pdf files with exiftool, so I spent quite some time implementing a custom proper™ way instead: rendering the PDF on a cairo surface, save the surface as an image, and finally print it on a PDF. Unfortunately, this trashes the accessibility, so a "lightweight" cleaning mode was added: it might not remove all the metadata, but it doesn't trash the file as much as the regular cleaning mode.

I also wrote a terrible user-interface for it, in PyGTK. Localization was challenging, not only because the tools are obnoxious, but also due to the fact that it's non-trivial to differentiate between "cleaned", "cleaning", "clean", "maybe clean", "looks clean but can't know for sure", … Anyway, everyone hated the GUI, except two blind users who found it wonderful and usable, go figure.

One of the most demanded feature was to support Microsoft Office documents. This was challenging not only because it's an intricate format for which I don't have the official™ software to experiment with, but also algorithmically, since I went with inheritance instead of composition, so I had to massage a bit some classes into looking pretty.

A classmate of mine was kind enough to draw a logo:

In October 2016, I took a break from MAT, due to health issues.

intrigeri was kind enough to write some patches to mitigate CVE-2017-9149, to prevent the Nautilus extension from silently failing to clean some files.

Somewhere in Summer 2018, I felt better, and wrote mat2 from scratch, in Python3, without hachoir, hosted on 0xacab, with a different capitalisation: mat2 vs MAT.

I drew a new logo, and Marie-Rose vastly improved it:

mat2 logo

Instead of writing a GUI that everyone hates, on the good advices of intrigeri, I went with writing a Nautilus extension, to have mat2 available via right-clic in the file-explorer. atenart and I (mostly him actually) managed to get it done in a couple of days. Miguel A. Marco-Buzunariz wrote one for Dolphin as well.

Sometimes around 2018, boum.org wanted to reduce the number of its services, so the mailing list smoothly moved to autistici.

In 2019, during a nice holidays, intrigeri spent some time implementing a sandbox using bubblewrap for subprocesses run by mat2, like ffmpeg and exiftool. Unfortunately, it's a bit brittle, due to the amount of moving piece involved, and their lack of maturity.

mat2 is using all the modern python development shenanigans:

I've been happily maintaining it to this day, mostly improving thoroughness when cleaning already supported fileformat and polishing rough edges. I'm glad that mat2 is now a rock solid boring software.

Downstreams

Some friends are taking care of packing mat2 in their favourite distributions:

Others have done the same for various distro/platforms:

Unfortunately, nobody ported it to Windows yet.

Graphical user interfaces

People have been kind enough to write graphical user interfaces for mat2:

Metadata Cleaner

Metadata Cleaner screenshot

rmnvgr wrote a Gtk4+ interface to mat2 in python named Metadata Cleaner, and it's amazingly popular!

mat2-web

mat2-web screenshot

jfriedli took over the development of mat2-web and its interface (since they're good at web while I'm clearly not), and neat collectives like immerda and systemli are running instances! Open source design did some design work, but I don't think that it was really used in the end.

Conclusion and thanks

Ten years already. I never thought that it would have been such an interesting journey when I saw the GSoC webpage. I met a ton of amazing people along the way, new friends, learned a ton of things, both about software development but also about social skills, human interaction, project management. Most of the involved/relevant people do like their lofty anonymity, so they won't be named here, but you know who you are, and I'll be happy to buy you a drink next time we meet, to celebrate both the past 10 years of MAT/mat2, but also to its next ten years!