Artificial truth

The more you see, the less you believe.

[archives] [latest] | [homepage] | [atom/rss/twitter]

Friends don't let friends write production software in Python
Mon 06 January 2020 — download

I've been writing Python since at least 8 years now. It used to be all fun and cool: writing scripts and small programs in a couple of minutes, no compilation times, pleasant syntactic sugar everywhere (contrary to go), no terrible idioms (contrary to bash/ksh/sh/…), a hint of functional programming for conciseness, with a zest of meta-programing/meta-objects for ugly clever hacks, a standard library with a lot of handy things, … it was so great! But nowadays it's an endless source of rage and sadness when dealing with non-trivial amount of code, and This is due to two main pain points: types and exceptions.

Python uses duck typing, meaning that there is no way to determine the type of a variable for non-trivial cases without running the code. It also means that variables can have multiple types, depending on the execution flow of the program. And this is oh-so much fun! Trying to apply a method to an object instance returned by a library? BOOM stacktrace in your face because you forgot to check if the object could be None! Using a function hastily ported from Python2 to Python3 which is now returning strings or bytes? Too bad you'll only find about this at runtime.

"But Python3 has type annotations you doofus, just use mypy" is usually the go-to answer to my complains. I do agree that mypy is a step in the right direction, but unfortunately, type annotations are, … well, annotations, upon which the Python interpreter doesn't do shit, except exposing it for external tools consumption, like mypy. But mypy doesn't work for non-trivial cases: In mat2, a ~3500 LoC Python library/program, I have 25 # type: ignore annotations, mostly because mypy gets in the way by not understanding what is going on.

It reminds me a bit of this drunk-ass friend who also happened to be super-high as well, having no clue about what you're currently doing, pointing at everything and asking weird questions about random stuff passing by, while you're focussing on keeping your eyes on the road because it's 3am and you just want to go to your bed, instead of ending up in a random ditch. FOR THE FOURTH TIME, THE NUMBERS ON THE SIDE OF THE ROAD AREN'T THE ONES FROM TOMORROW'S LOTTERY, WHAT MAKES YOU THINK THAT, AND WHY CAN'T YOU INFER THINGS FROM MY PREVIOUS STATEMENTS‽

Anyway, mypy also has a terrible syntax: can you write, without looking at the documentation, an annotation for a generator returning subclasses of a particular class? Or even a dictionary containing a arbitrary number of nested dictionaries?

The second major issue is the management of exceptions: in Python's world, contrary to the verbose and civilised Java one, there is no way to declare what exceptions could be raised by a particular function. There are also no tools to validate that you're catching all the relevant ones. The only thing you can do is to add formatted comments to declare what exceptions could be raised. You've seen such comments used in Python's stdlib, and you trust the documentation to be comprehensive? Fool, you used common sense! Python's documentation doesn't document shit when it comes to exceptions, and this is working as indented in Python's world. But surely this isn't an issue, right? Well, can you guess what re.compile can raise? UnicodeDecodeError, OverflowError, RuntimeError, ValueError, re.error, but maybe you don't care, since you're usually not allowing arbitrary inputs in the functions. So what about tarfile.open, opening untrusted archive? tarfile.TarError, ValueError, OverflowError, EOFError and zlib.error. This of course piled on top of the fact that Python's stdlib doesn't check anything to defend against malicious tar archives resulting in path traversal and the likes.

What if you're processing images via PIL Pillow? Something simple, like converting pictures to PNG with Image.open(…).save(io.BytesIO(), "PNG")? This can result in (at least) AttributeError, IOError, OSError, MemoryError, OverflowError, RuntimeError, SyntaxError, TypeError, ValueError, Image.DecompressionBombError, struct.error and subprocess.CalledProcessError.

The only solution is either to wrap every single call to Python's stdlib in a try: … except Exception: which is awful, or to pray that nothing will explode at runtime, and cry loudly when it happens.

Why do I care so much about unexpected stacktraces? I do because mat2 is dealing with untrusted fileformats: users will throw all kind of random malformed files at it, and I'm expecting meaningful exceptions that I can catch should something go wrong, not eldrich-like unpredictable monstrosities crawling from the depth of Python's core in a fireworks of traces scaring my beloved users away.

But Python was born in 1990, it's old and rock solid, it doesn't yield uncanny stuff and handles everything in an educated, human and civilised way. Except that it doesn't: it's trivial to raise strange exceptions and uncover mysterious behaviours with stupid fuzzers in a couple of minutes if not seconds.

Don't let your friends write production code in Python, especially when it's dealing with weird file formats: things will blow up.