anarcat recently published a blog post entitled Theory: average bus factor = 1, speculating that the average bus factor for open source software is on average, one. The average might even be lower, with unmaintained zombies projects still being both packaged and used everywhere.
Dramatically Low bus factors is a well known issue regularly surfacing every time a new major vulnerability in an open-source software is made public: Heartbleed highlighted that OpenSSL only had two developers to take care of ½ million lines of code, when people had fun with SKS servers we discovered that the main implementation was a steaming heap of unmaintained and unmaintainable OCaml, GnuPG almost died in 2015 because Werner Koch was underfunded and wanted to make a decent salary, CopperheadOS died when its lead and only developer left, … the list goes on and on.
While this is scary and shit, I haven't seen a lot of material about what can be done to fix it, beside the obvious, like trying to make your project a nice and welcoming place so that maybe, a healthy community could form around it.
But I'm convinced that we could do more, as I'm going to suggest based on personal anecadata about my scholarship.
Don't force people to reinvent the wheel
I distinctively remember the C course semester project that I was given
almost 10 years ago at school: implement a
and compute determinant/eigenvectors/eigenvalues/… which boiled down to
- Reimplementing (as in copy-paste code from) the GNU Scientific Library/ ALGLIB/ Eigen/…
- Get a good grade
- Throw the code away
It would have been great to have to fix some bugs in those libraries instead of reinventing the wheel that I didn't care about.
I'm not saying that people shouldn't write their own version of existing projects: it's great and empowering to be do so, but if you're going to push your students into writing some code, please direct them towards existing projects. Heck, you could even send an email to some maintainers/lead dev of projects that you like, to them if they would be ok mentoring and helping your students, to help get some bug fixed or cool new features added: you'll likely get enthusiastic replies.
Teach how to contribute
During my bachelor's degree, there was a mandatory code project, to get done in teams of a dozen student each, over a whole semester, about writing an inventorying system for car parts in PHP, so that a small manager at the local car manufacturer, who was kind enough to perform a sketchy interview about the requirements, could pick the least worse one for free.
Anyway, when I asked my fellow schoolmates what tools we should use for this assessment, the consensus was to use Dropbox and USB keys as a revision control, a word document as bug tracker, and a single Apache2 deployment where everybody could copy their code and check if it was working.
This was completely insane, and I was genuinely angry both at my schoolmates, and at the teachers who thought that teaching things like MERISE and Scrum/Agile/… was a better investment of everybody's time than explaining to his students how to use version control software, bug trackers, how to properly communicate on a mailing list, use static analysis tools, take advantage of continuous integration, perform efficient and useful code reviews, …
I have the strong feeling that instead of wasting 4h a week working on such a terrible project, we could have learned so much more working together to implement whatever cool feature in a large open-source project.
Moreover, no matter how great your code is, it won't get merged, or even reviewed at all if you don't know how to send patches, communicate on a mailing list, and handle reviews and nits.
Interestingly, in 2004, Daniel J. Bernstein ran a university course called, MCS 494, UNIX Security Holes, in which the students were required to find real world security vulnerabilities in open source software and report them to the corresponding maintainers.
During my master's degree, I was able to take a "coding project course": I had to define a worthy project that I wanted to achieve during the semester, and find a teacher who was willing to supervise me. So I spent 3 months improving the search capabilities of radare2: string constants, patterns, ROP gadgets, … and wrapper it up with an overly enthusiastic presentation, about building architecture-agnostic rop-chains in efficient ways, in front of medusé teachers! I got the best possible grade, and was so proud of being able to contribute to a real-world software! I even convinced the head of the department to postpone my exams so that I could go host a radare2 workshop at the hacklu!
As a teacher, you should find ways to reward the nights some students are spending writing code for open-source projects: it's nothing more than a practical course after all.
If everything fails, at least put a label
Github is doing interesting things to warn developers about security issues, both in their code, but in their dependencies as well. I think that the next logical step, not only for github but also for MVN Repository, PyPI, npmjs, … is to automatically label dead projects, and warn dependencies accordingly, for example:
WARNING: your project has a critical dependency on FooLib. This project has had only 2 contributors in the past 17 years and has had no commits in the last 16 years. You might want to consider the risk to your project of depending on this code…
If you know some CS students and teachers, tell them about the Google Summer of Code, Outreachy and similar initiatives, who are paying students to work on amazing open-source projects during their holidays. Tell them about those crazy people writing code on their free time for everyone to use, and how happy they would be to mentor and help students, for free.
Some of the ideas in this blogpost stemmed from an friendly email from Chad Dougherty, many thanks to him!