Deny all robots.txt respecting robots #1

Merged
zesterer merged 1 commit from edsu/babble:deny-robots into main 2025-05-26 16:02:56 +02:00
Contributor

Currently the robots.txt is set up to allow complete access by robots. This means that well meaning bots that actually respect a site's wishes with regards to crawling will be invited into the maze.

I think it makes more sense to tell all robots to go away, and if the robot just blindly ignores this it will get lost in the babble tarpit.

This means that over time, given enough babble instances, it will be in the interests of bot creators to ensure their bots respect robots.txt so that they don't incur the cost to their compute, bandwidth, and ultimately the quality of their model/index/whatever.

Let the babble instances multiply!!!

To exclude all robots from the entire server

User-agent: *
Disallow: /

To allow all robots complete access

User-agent: *
Disallow:

via https://www.robotstxt.org/robotstxt.html

Currently the robots.txt is set up to allow complete access by robots. This means that well meaning bots that actually respect a site's wishes with regards to crawling will be invited into the maze. I think it makes more sense to tell all robots to go away, and if the robot just blindly ignores this it will get lost in the babble tarpit. This means that over time, given enough babble instances, it will be in the interests of bot creators to ensure their bots respect robots.txt so that they don't incur the cost to their compute, bandwidth, and ultimately the quality of their model/index/whatever. **Let the babble instances multiply!!!** ``` To exclude all robots from the entire server User-agent: * Disallow: / To allow all robots complete access User-agent: * Disallow: ``` via https://www.robotstxt.org/robotstxt.html
edsu added 1 commit 2025-05-21 17:53:44 +02:00
Currently the robots.txt is set up to allow complete access by robots.
This means that well meaning bots that actually respect a sites wishes
with regards to crawling will be invited into the maze.

I think it makes more sense to tell all robots to go away, and if the
robot just blindly ignores this it will get lost in the babble tarpit.

Given enough babble instances this means that over time bot creators
will write LLM scraping bots that respect robots.txt so that they don't
incur the cost to their compute, bandwidth, and ultimately the quality
of their model.

```
To exclude all robots from the entire server

User-agent: *
Disallow: /

To allow all robots complete access

User-agent: *
Disallow:
```

via https://www.robotstxt.org/robotstxt.html
edsu changed title from Deny all robots to Deny all robots.txt respecting robots 2025-05-21 18:06:47 +02:00
Owner

Thanks so much!

Thanks so much!
zesterer merged commit a86d7720c7 into main 2025-05-26 16:02:56 +02:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: zesterer/babble#1
No description provided.