Commit graph

18 commits

Author SHA1 Message Date
3ec4cd8595
Deny all robots
Currently the robots.txt is set up to allow complete access by robots.
This means that well meaning bots that actually respect a sites wishes
with regards to crawling will be invited into the maze.

I think it makes more sense to tell all robots to go away, and if the
robot just blindly ignores this it will get lost in the babble tarpit.

Given enough babble instances this means that over time bot creators
will write LLM scraping bots that respect robots.txt so that they don't
incur the cost to their compute, bandwidth, and ultimately the quality
of their model.

```
To exclude all robots from the entire server

User-agent: *
Disallow: /

To allow all robots complete access

User-agent: *
Disallow:
```

via https://www.robotstxt.org/robotstxt.html
2025-05-21 11:45:46 -04:00
a77eb52c56 Add robots.txt support 2025-04-29 21:10:12 +01:00
6b05f7fd52 Correct worst offenders ordering 2025-04-29 21:00:11 +01:00
4007c07dc5 Fetch client ip from X-Forwarded-For, if possible 2025-04-29 20:57:25 +01:00
e21e5fffa5 Stats collection 2025-04-29 20:37:34 +01:00
c7073c8fbe Added artifical chunked response slowdown 2025-04-29 19:32:02 +01:00
6a3009df26 Remove whitespace 2025-04-28 23:02:34 +01:00
f40478a58d Added persistent counts, faster RNG, stats in page 2025-04-28 22:55:12 +01:00
d602984fbd Fixed incorrect default port 2025-04-22 17:13:00 +01:00
c5c8e5d72a Individual cert and key paths 2025-04-22 14:51:03 +01:00
8b39e1fca6 Better AST generator 2025-04-21 12:01:11 +01:00
41a442ce0a Added README 2025-04-21 11:44:08 +01:00
e9b8272706 Added generator abstraction 2025-04-21 11:26:08 +01:00
40f159df1c Removed dependency on wordlist 2025-04-21 01:04:12 +01:00
1c1f28007c Full sock 2025-04-21 00:25:06 +01:00
e85e181681 CLI args, configurable port 2025-04-21 00:16:35 +01:00
a977ac40b1 TLS support 2025-04-20 23:57:15 +01:00
3b1417036b Initial implementation 2025-04-20 23:43:03 +01:00