I already mentioned that the PDF format is a real mess; making it non-trivial to process, and thus non-trivial to remove every metadata that it could carry.
All metadata edits are reversible. While this would normally be considered an advantage, it is a potential security problem because old information is never actually deleted from the file.
You can indeed restore metadata removed with this method with
exiftool -pdf-update:all= file.pdf
- Append a new version of the metadata with exiftool
- Remove unreferenced PDF objects (like old metadata) with qpdf
This method has several drawbacks in my opinion:
- Nothing guarantees that your old metadata will actually be removed, if they are referenced somewhere else in your file.
- This approach won't clean metadata of files embedded within the PDF.
This ensures that:
- Metadata from images are removed, since they are re-renderer
- Videos are transformed into screenshots (This is a actually a feature, because it's making video-powered fingerprinting much more harder.),
- Weird embedded objects are discarded
To my knowledge, this is for now the
best less worse way to clean a PDF file;
but I'll be delighted to be proven otherwise ;)
(Ho, and by the way, since several people asked me about this, I sat a github mirror up for MAT. Send me pull-requests to prove me this it's worth keeping it alive.)