Trivial anti-crawler with Caddy

With the internet being crawled to death to feed the AI God, it's becoming seriously annoying to exposed web content on the internet. While anubis works, it's yet another layer of complexity. It might be worth deploying and tuning it for high-profile websites, but for my cgit instance, it's absolutely overkill.

Instead, I'm taking advantage of Caddy (whose documentation doesn't have a search feature‽) matching capabilities to gate access on either the ability to execute javascript to set a cookie, or having a user-agent string starting with git/ so that repository are still cloneable.

git.dustri.org {
        import tls
        import noindex
        import compress

        @unverified {
                not header Cookie *not_a_crawler=1*
                not header User-Agent git/*
        }
        handle @unverified {
                header Content-Type text/html
                respond <<EOF
                    <script>
                    document.cookie = 'not_a_crawler=1';
                    window.location.reload();
                    </script>
                EOF 418
        }

        reverse_proxy cgit_upstream
}

It's not perfect, trivial to bypass, but strikes the right balance between simplicity/zero-maintenance and blocking crawlers. This technique is now used on https://autopkgtest.ubuntu.com.

Artificial truth

archives | latest | homepage

Trivial anti-crawler with Caddy
Tue 23 June 2026 — download