Title: Trivial anti-crawler with Caddy Date: 2026-06-23 13:45 With the internet being crawled to death to feed the AI God, it's becoming seriously annoying to exposed web content on the internet. While [anubis](https://anubis.techaro.lol/) works, it's yet another layer of complexity. It might be worth deploying and tuning it for high-profile websites, but for my [cgit instance](https://git.dustri.org), it's absolutely overkill. Instead, I'm taking advantage of [Caddy](https://caddyserver.com/) (whose [documentation](https://caddyserver.com/docs/) doesn't have a search feature‽) matching capabilities to gate access on either the ability to execute javascript to set a cookie, or having a user-agent string starting with `git/` so that repository are still cloneable. ```caddy git.dustri.org { import tls import noindex import compress @unverified { not header Cookie *not_a_crawler=1* not header User-Agent git/* } handle @unverified { header Content-Type text/html respond < document.cookie = 'not_a_crawler=1'; window.location.reload(); EOF 418 } reverse_proxy cgit_upstream } ``` It's not perfect, trivial to bypass, but strikes the right balance between simplicity/zero-maintenance and blocking crawlers. This technique is [now used](https://github.com/canonical/ubuntu-autopkgtest-operators/pull/127/changes) on .