How to throw your ebook library at ipfs
Wed 01 March 2023 — download

I have a decent ebook library, featuring several thousands of books, that are all assumed to come from the project Gutenberg for the rest of this article. I wanted to put all of them on ipfs to help alleviate the load on Anna's Archive and make it more resilient to intempestive shutdowns, as they asked help for.

The ipfs project recommended way to interacting with the network is to use kubo. Unfortunately, it's not packaged in Alpine Linux anymore, and the prebuilt binaries are depending on the glibc, making them unable to be ran with musl, so build from source it is:

$ git clone https://github.com/ipfs/kubo.git
$ cd kubo
$ make build CGO_ENABLED=0
$ /home/ipfs/kubo/cmd/ipfs/ipfs version
ipfs version 0.19.0-dev
$

My collection of ebooks is around 50G sitting on my NAS, so I really don't want to have it duplicated to be put into ipfs, which is the default method with kubo. Fortunately, in 2017, the --nocopy option was added to allow exactly this behaviour, although it's still marked as experimental. Enabling "Accelerated DHT" is also handy to speed things up, so make sure to enable it as well.

I'm using proxmox on my hypervisor, so I could mount my ebooks as read-only inside of my ipfs container by adding mp0: /mnt/nfs/books/,mp=/home/ipfs/books/,ro=1 to its configuration file.

Because I'm using calibre to manage my virtual library, book covers (jpg, png, …) and metadata files have to be excluded. Also make sure to use --hash=blake2b-256 --chunker=size-1048576 since this is what Anna's Archive is using. "Interestingly" one can pick amongst hundreds of hashing primitives to generate CID on ipfs.

$ /home/ipfs/kubo/cmd/ipfs/ipfs init --profile server
$ /home/ipfs/kubo/cmd/ipfs/ipfs config --json Experimental.AcceleratedDHTClient true
$ /home/ipfs/kubo/cmd/ipfs/ipfs config --json Experimental.FilestoreEnabled true
$ /home/ipfs/kubo/cmd/ipfs/ipfs add -r --pin=true --hash=blake2b-256 --chunker=size-1048576 --nocopy --ignore='*.jpg' --ignore='*.png' --ignore='*.opf' ./books/
[……]
added bafykbzacedbkjcavohu3tghnva6v3nwgpdtz3umtt3v7s5tgmjyeltrjsftmw books/Pierre-Joseph Proudhon/De la justice dans la Revolution et dans l'Eglise (515)
added bafykbzacebwarcecieaoeeftrex4r5yisicztrzyjdarpw3ijgypsq4fpnzlm books/Pierre-Joseph Proudhon/Qu'est-ce que la propriete _ (197)
[……]
added bafykbzacec7xpt2ojfyfeyqhw76yyzukeym4nvwhukcmyincefuyp5jzrfnhq books/William Shakespeare/Macbeth (389)
[……]
48.36 GiB / 48.36 GiB [===============================================] 100.00%
$ /home/ipfs/kubo/cmd/ipfs/ipfs swarm peers | wc -l
1337
$ /home/ipfs/kubo/cmd/ipfs/ipfs stats bw --poll=true --interval=1s
Total Up    Total Down  Rate Up     Rate Down
  2.9 GB      5.2 GB       27 MB/s   51.3 MB/s      

Now that things are working, it's only a matter of writing a simple openrc unit, running rc-update add ipfs and rebooting:

#!/sbin/openrc-run

name="ipfs"
command="/home/ipfs/kubo/cmd/ipfs/ipfs"
command_args="daemon"
pidfile="/run/ipfs.pid"
command_user="ipfs"
command_background=true

depend() {
    need net
}

And here we go, my whole library on ipfs for the whole world to enjoy:

$ curl -s -i https://ipfs.io/ipfs/bafykbzacedbkjcavohu3tghnva6v3nwgpdtz3umtt3v7s5tgmjyeltrjsftmw/ | grep epub -m 1
>De la justice dans la Revolution et dans l - Pierre-Joseph Proudhon.epub
$ curl -s -i https://ipfs.io/ipfs/bafykbzacebwarcecieaoeeftrex4r5yisicztrzyjdarpw3ijgypsq4fpnzlm/ | grep epub -m 1
>Qu'est-ce que la propriete _ - Pierre-Joseph Proudhon.epub

Don't forget to add a crontab to automatically add new books:

# echo '0 0 * * 0 /home/ipfs/kubo/cmd/ipfs/ipfs add -r --pin=true --hash=blake2b-256 --chunker=size-1048576 --nocopy --ignore='*.jpg' --ignore='*.png' --ignore='*.opf' ./books/' >| /etc/crontabs/ipfs