Junk food for your local LLM https://content.jsbarretto.com/void
Find a file
2025-04-29 20:37:34 +01:00
src Stats collection 2025-04-29 20:37:34 +01:00
.gitignore Stats collection 2025-04-29 20:37:34 +01:00
Cargo.lock Stats collection 2025-04-29 20:37:34 +01:00
Cargo.toml Stats collection 2025-04-29 20:37:34 +01:00
README.md Individual cert and key paths 2025-04-22 14:51:03 +01:00
wap.txt Initial implementation 2025-04-20 23:43:03 +01:00

Babble

Standalone LLM crawler tarpit binary. Generates an endless stream of deterministic bollocks to be ingested by bots, with plenty of links.

Why?

  • Divert and slow down LLM crawler traffic, protecting your main site
  • Potentially poison LLM training data (likely not very effective)
  • Collective defence; the more time a scraper spends swallowing babble, the less time it'll spend bulling someone else's site
  • Do your bit to protect the public commons from those who would readily see it destroyed for the sake of an investment round

Usage

--cert <path> | Path of `cert.pem` (for TLS)
--key <path> | Path of `key.pem` (for TLS)
--sock <address> | Bind to the given socket. Defaults to 0.0.0.0:3000.

Deploy it in a docker environment. It's probably safe, but no reason to take chances.

If you want to be nice to crawlers that actually abide by robots.txt, perhaps add an entry to warn search engines away from it.

Usage terms

There are none, other than those implied by dependencies. Use it whenever and wherever you want, and in any way.

Attribution

Fuck you, Sam Altman.