Junk food for your local LLM https://content.jsbarretto.com/void

Find a file

Joshua Barretto a86d7720c7 Merge pull request 'Deny all robots.txt respecting robots' (#1 ) from edsu/babble:deny-robots into main Reviewed-on: #1		2025-05-26 16:02:55 +02:00
src	Deny all robots	2025-05-21 11:45:46 -04:00
.gitignore	Stats collection	2025-04-29 20:37:34 +01:00
Cargo.lock	Fetch client ip from X-Forwarded-For, if possible	2025-04-29 20:57:25 +01:00
Cargo.toml	Fetch client ip from X-Forwarded-For, if possible	2025-04-29 20:57:25 +01:00
README.md	Add robots.txt support	2025-04-29 21:10:12 +01:00
wap.txt	Initial implementation	2025-04-20 23:43:03 +01:00

README.md

Babble

Standalone LLM crawler tarpit binary. Generates an endless stream of deterministic bollocks to be ingested by bots, with plenty of links.

Why?

Divert and slow down LLM crawler traffic, protecting your main site
Potentially poison LLM training data (likely not very effective)
Collective defence; the more time a scraper spends swallowing babble, the less time it'll spend bulling someone else's site
Do your bit to protect the public commons from those who would readily see it destroyed for the sake of an investment round

Usage

--cert <path> | Path of `cert.pem` (for TLS)
--key <path> | Path of `key.pem` (for TLS)
--sock <address> | Bind to the given socket. Defaults to 0.0.0.0:3000.

Babble will search for a robots.txt file in the working directory to use. If it does not find one, it will use a default one that denies everything.

Babble will periodically emit statistics into stats.txt, showing information about the worst-offending requesting IPs.

Warning

Deploy it in a docker environment. It's probably safe, but no reason to take chances.

Usage terms

There are none, other than those implied by dependencies. Use it whenever and wherever you want, and in any way.

Attribution

Fuck you, Sam Altman.