Title: Ten years of MAT
Date: 2021-11-01 14:30

Around 10 years ago, the 7<sup>th</sup> of June 2011, the
[first commit](https://gitweb.torproject.org/user/jvoisin/mat.git/commit/?id=f7082a21d6511c5069fbb9ff186ce22f3e22fed7)
of MAT, the 
[metadata anonymisation toolkit](https://www.google-melange.com/archive/gsoc/2011/orgs/tor/projects/jvoisin.html),
was published. This was done as part as my [Google Summer of
Code](https://www.google-melange.com/archive/gsoc/2011/orgs/tor/projects/jvoisin.html),
for the [Tails](https://tails.boum.org/) project, under the umbrella of
[Tor](https://torproject.org) and the [EFF](https://eff.org). My mentor was
[Mike Perry](https://gitlab.torproject.org/mikeperry), but since he was
super-duper-busy, [intrigeri](https://gaffer.boum.org/intrigeri/) was my *de
facto* mentor. I fondly remember writing the [design
proposal]({filename}/metadata/mat_design.md) in completely broken English.

Written in Python 2.5, to support [Debian
squeeze](https://www.debian.org/News/2011/20110205a), MAT was mostly a wrapper
around [hachoir](https://hachoir.readthedocs.io/en/latest) at the time, and
steadily supported more and more fileformats thanks to projects like
[exiftool](https://exiftool.org) and [mutagen](https://mutagen.readthedocs.io).

intrigeri insisted that I used [test driven development](https://en.wikipedia.org/wiki/Test-driven_development),
and it was a tremendously good idea: I spent some days cleaning up files that I
wanted MAT to support, by hand. With those, I wrote a comprehensive testsuite:
given this file with metadata, I want to end with this cleaned file. All that
remained was to write the code, with the assurance that it was working.

[MAT's website](https://web.archive.org/web/20120515004809/https://mat.boum.org/), along with
its [mailing list](https://www.autistici.org/mailman/listinfo/mat-dev), was initially
hosted by the kind people of [boum.org](https://web.archive.org/web/20120113135558/https://boum.org/).

During the development, I [realised]({filename}/metadata/pdf_exiftool.md)
that there is no way to properly remove metadata from pdf files with exiftool,
so I spent quite some time implementing a custom proper™ way instead: rendering
the PDF on a [cairo surface](https://cairographics.org/manual/cairo-cairo-surface-t.html),
save the surface as an image, and finally print it on a PDF.
Unfortunately, this trashes the accessibility, so a "lightweight" cleaning mode
was added: it might not remove all the metadata, but it doesn't trash the file
as much as the *regular* cleaning mode.

I also wrote a terrible user-interface for it, in
[PyGTK](https://developer.gnome.org/pygtk/stable). Localization was
challenging, not only because the tools are obnoxious, but also due to the fact
that it's non-trivial to differentiate between "cleaned", "cleaning", "clean",
"maybe clean", "looks clean but can't know for sure", …
Anyway, everyone hated the GUI, except two blind users who found it wonderful
and usable, go figure.

One of the most demanded feature was to support
[Microsoft Office](https://en.wikipedia.org/wiki/Office_Open_XML) documents.
This was challenging not only because it's an intricate format for which I
don't have the official™ software to experiment with,
but also algorithmically, since I went with inheritance instead of
composition, so I had to massage a bit some classes into looking pretty.

A classmate of mine was kind enough to draw a logo:

<a href="./images/MAT_logo.png">
	<img class="half_img" src="./images/MAT_logo.png">
</a>

In [October 2016]({filename}/metadata/mat_hold.md),
I took a break from MAT, due to health issues.

intrigeri was kind enough to
write some patches to mitigate [CVE-2017-9149]( https://cve.circl.lu/cve/CVE-2017-9149 ),
to prevent the Nautilus extension from silently failing to clean some files.

Somewhere in Summer 2018, I felt better, and wrote
[mat2]({filename}/metadata/mat2.md) from scratch, in
Python3, without hachoir, hosted on [0xacab](https://0xacab.org), with a
different capitalisation: mat2 vs MAT.

I drew a [new logo](https://0xacab.org/jvoisin/mat2/-/commit/6aeffe6823bfcaa8ef900002ae9eb54ef24ae805),
and Marie-Rose [vastly improved it](https://0xacab.org/mat/mat/-/issues/11524):

[![mat2 logo]({static}/images/mat2_logo.png)]({static}/images/mat2_logo.png)

Instead of writing a GUI that everyone hates, on the good advices of intrigeri,
I went with writing a [Nautilus
extension](https://0xacab.org/jvoisin/mat2/-/issues/2), to have mat2 available
via right-clic in the file-explorer. [atenart](https://antoine.tenart.fr/) and
I (mostly him actually) managed to get it done in a couple of days.
[Miguel A. Marco-Buzunariz](https://riemann.unizar.es/~mmarco/) [wrote
one](https://0xacab.org/jvoisin/mat2/-/tree/master/dolphin) for
[Dolphin](https://apps.kde.org/dolphin/) as well.

Sometimes around 2018, boum.org wanted to reduce the number of its services,
so the mailing list smoothly moved to [autistici](https://www.autistici.org/).

In 2019, during a nice holidays, intrigeri spent some time implementing a
sandbox using [bubblewrap](https://github.com/containers/bubblewrap) for
subprocesses run by mat2, like [`ffmpeg`](https://ffmpeg.org/) 
and [`exiftool`](https://exiftool.org/). Unfortunately, it's a bit brittle, due
to the amount of moving piece involved, and their lack of maturity.

mat2 is using all the modern python development shenanigans:

- [Static analysis via lgtm.com](https://lgtm.com/projects/g/jvoisin/mat2/context:python)
- Close to [100% tests coverage](https://0xacab.org/jvoisin/mat2)
- [continuous integration](https://0xacab.org/jvoisin/mat2/-/blob/master/.gitlab-ci.yml)
  running the testsuite on Debian, Fedora, Archlinux and Gentoo,
  on every commit, and once a week.
- Linting via [bandit](https://bandit.readthedocs.io/en/latest/),
  [pyflakes](https://github.com/PyCQA/pyflakes) and [pylint](https://pylint.org/)
- Systematic type-annotation, verified by [mypy](http://mypy-lang.org/)

I've been happily maintaining it to this day, mostly improving thoroughness when
cleaning already supported fileformat and polishing rough edges. I'm glad that
mat2 is now a rock solid [boring software](https://tqdev.com/2018-the-boring-software-manifesto).

# Downstreams

Some friends are taking care of packing mat2 in their favourite distributions:

- [georg](https://0xacab.org/georg) has been my downstream for [Debian](https://tracker.debian.org/pkg/mat2),
  handling not only the packaging, but also helping with bug reports, 
  backports, continuous-integration, and various small bug fixes.
  Speaking of Debian, mat2 is now
	[more popular than MAT ever was](https://qa.debian.org/popcon.php?package=mat2),
  with at least ~400 installations. He also helped a lot with the continuous-integration setup.
- [atenart](https://antoine.tenart.fr/), a good friend of mine, was kind enough
  to [package it](https://copr.fedorainfracloud.org/coprs/atenart/mat2/)
  for Fedora, and help with minor patches.
- [kpcyrd](https://github.com/kpcyrd), a [#websec](https://websec.fr/faq#contact_faq) fellow,
  is taking good care of the
	[archlinux packaging](https://archlinux.org/packages/community/any/mat2/),
	with a bit [more than 150 installations](https://pkgstats.archlinux.de/packages/mat2).

Others have done the same for various distro/platforms:

- on [brew](https://formulae.brew.sh/formula/mat2) for OSX users.
- on [nixos]( https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/python-modules/mat2/default.nix )
- on [pypi]( https://pypi.org/project/mat2 )
- on [gentoo's GURU overlay]( https://gpo.zugaina.org/Overlays/guru/app-misc/mat2 )

Unfortunately, nobody
[ported it to Windows](https://docs.microsoft.com/en-us/windows/win32/api/wingdi/ns-wingdi-mat2) yet.

# Graphical user interfaces

People have been kind enough to write graphical user interfaces for mat2:

## Metadata Cleaner

[![Metadata Cleaner screenshot]({static}/images/metadata_cleaner.png)]({static}/images/metadata_cleaner.png)

[rmnvgr](https://www.romainvigier.fr/) wrote a Gtk4+ interface to mat2 in python named
[Metadata Cleaner](https://metadatacleaner.romainvigier.fr), and it's
[amazingly popular](https://klausenbusk.github.io/flathub-stats/#ref=fr.romainvigier.MetadataCleaner&interval=infinity&downloadType=installs%2Bupdates)!

# mat2-web

[![mat2-web screenshot]({static}/images/mat2-web.png)]({static}/images/mat2-web.png)

[jfriedli](https://0xacab.org/jfriedli) took over the development of
[mat2-web](https://0xacab.org/jvoisin/mat2-web/) and [its interface](https://0xacab.org/jfriedli/mat2-quasar-frontend)
(since they're good at web while I'm clearly not), and neat collectives like
[immerda](https://www.immerda.ch//info/2020/11/01/metadaten-entfernen-als-neuer-dienst.html)
and [systemli](https://www.systemli.org/service/metadata/) are running instances!
[Open source design](https://opensourcedesign.net/jobs/jobs/2019-02-03-improving-the-two-webpages-of-a-simple-web-interface-to-a-metadata-removal-tool)
did some design work, but I don't think that it was really used in the end.

# Conclusion and thanks

Ten years already. I never thought that it would have been such an interesting
journey when I saw the GSoC webpage. I met a ton of amazing people along the
way, new friends, learned a ton of things, both about software development
but also about social skills, human interaction, project management. 
Most of the involved/relevant people do like their lofty anonymity, so they
won't be named here, but you know who you are, and I'll be happy to buy you a
drink next time we meet, to celebrate both the past 10 years of MAT/mat2, but
also to its next ten years!
