Is forums.debian.net linked to Debian or is it some kind of scam using "debian.net" to confuse the user into thinking it's legitimate?
I just failed their "test for human" when trying to register, and the
only reason I tried to register was to see some replies which are hidden behind an obnoxious "You must be a registered member and logged in to
view the replies in this topic".
That seems rather contrary to Debian's general philosophy.
Is forums.debian.net linked to Debian or is it some kind of scam usingforums.debian.net is administratively controlled by the Debian
"debian.net" to confuse the user into thinking it's legitimate?
project; see <https://wiki.debian.org/DebianNetDomains>.
Thank the LLM craze for that. Nearly every admin of an open site IIs forums.debian.net linked to Debian or is it some kind of scam usingforums.debian.net is administratively controlled by the Debian
"debian.net" to confuse the user into thinking it's legitimate?
project; see <https://wiki.debian.org/DebianNetDomains>.
Having spent more time around it, and recovered from my frustration,
I can confirm that it seems very legitimate.
[ And I finally managed to register, after finding the fault in my
human reasoning (I stupidly hadn't noticed that there were more fields
to fill beyond the bottom of the window. I guess a bot wouldn't
have made that mistake ?). ]
but I suppose you need to compromise with LLM
robots going wild.
They don't always care. Their hosting provider isn't always in positionbut I suppose you need to compromise with LLM
robots going wild.
Are they not required to follow do_not_track http headers or robots.txt ? If LLM robots do not obey these instructions, they should be probably reported to their hosting provider.
but I suppose you need to compromise with LLM
robots going wild.
Are they not required to follow do_not_track http headers or robots.txt ? If LLM robots do not obey these
instructions, they should be probably reported to their hosting provider.
svetlana@members.fsf.org wrote:
Are they not required to follow do_not_track http headers or robots.txt ? If LLM robots do not obey these
instructions, they should be probably reported to their hosting provider.
Hahahaha. No.
The current crop of LLM morons do not care at all about following
accepted rules or norms. They just want to grab all the data, screw
everybody else. They're ignoring robots.txt, so service admins started blocking netblocks some time ago.
Now we have the LLM morons using botnets to evade those blocks. We
have massive amounts of downloads coming from random residential IPs
all over the world, carefully spread out to make it more difficult to
block them.
These morons are why we can't have nice things.
On Tue, Jan 20, 2026 at 12:03:38 +0000, Steve McIntyre wrote:[rest snipped]
svetlana@members.fsf.org wrote:
Are they not required to follow do_not_track http headers or robots.txt ? If LLM robots do not obey theseHahahaha. No.
instructions, they should be probably reported to their hosting provider. >>
The current crop of LLM morons do not care at all about following
accepted rules or norms. They just want to grab all the data, screw
everybody else. They're ignoring robots.txt, so service admins started
blocking netblocks some time ago.
Now we have the LLM morons using botnets to evade those blocks. We
have massive amounts of downloads coming from random residential IPs
all over the world, carefully spread out to make it more difficult to
block them.
These morons are why we can't have nice things.
I can confirm this. My own wiki was slammed really hard by this,
resulting in my having to take substantial actions to limit the
availability of some "pages".
The issue isn't even that the LLM bots are harvesting every wiki page.
If it were only that, I wouldn't mind. The first problem is wikis allow
you to request the difference between any two revision of a page. So,
let's say a page has 100 revisions. You can request the diff between revision 11 and revision 37. Or the diff between revision 14 and
revision 69. And so on, and so on.
What happens is the bots request *every single combination* of these
diffs, each one from a random IP address, often (but not always) with
a falsified user-agent header.
I've blocked all the requests that give a robotic user-agent, but there's really nothing I can do about the ones that masquerade as Firefox or whatever, unless I need to take it a step further and block all requests
that ask for a diff. I haven't had to do that yet. Maybe the LLM herders have finally put *some* thought into what they're doing and reduced the stupidity level...? Dunno.
Compounding that, MoinMoin has some sort of bizarre calendar thing
that I've never used and don't really understand. But apparently
there's a potential page for every single date in a range that spans
multiple centuries. I've deleted all of those pages *multiple* times,
but spam bots got those pages into their "try to edit" caches, so they
kept coming back. Meanwhile, the LLM harvesters got those pages into
their "try to fetch" caches, so they would keep requesting them, even
though the pages didn't exist any longer.
So, another action I had to take was to block every request that tries
to hit one of those calendar pages, at the web server level, before it
could even make it to the wiki engine.
So, I've got this:
# less /etc/nginx/conf.d/badclient.conf
map $http_user_agent $badclient {
default 0;
"~BLEXBot/" 1;
"~ClaudeBot/" 1;
"~DotBot/" 1;
"~facebookexternalhit/" 1;
"~PetalBot;" 1;
"~SemrushBot" 1;
"~Thinkbot/" 1;
"~Twitterbot/" 1;
"~BadClient/" 2;
}
map $request_uri $badrequest {
default 0;
"~/MonthCalendar/" 1;
"~/MonthCalendar\?" 1;
"~/SampleUser/" 1;
"~/WikiCourse/" 1;
"~/WikiKurs/" 1;
}
And this:
# less /etc/nginx/sites-enabled/mywiki.wooledge.org
server {
listen 80;
listen 443 ssl;
server_name mywiki.wooledge.org;
if ($badclient) {
return 403;
Why not
402 Payment Required
instead, with instructions on how pay for the privilege of getting on
a whitelist?
We tracked it back to bots hitting our wiki and trying to make
anonymous edits. The bots would try to make edits, and that would
spin-up that useless Wiki Editor from Wikimedia. The edit would
eventually fail (during Save) because the bot was not authenticated.
I seem to recall we were seeing between 3 and 7 edits per second from
bots.
On Tue, Jan 20, 2026 at 3:53?PM Greg Wooledge <greg@wooledge.org> wrote:
On Tue, Jan 20, 2026 at 15:35:10 -0500, Davidson wrote:
Why not
402 Payment Required
instead, with instructions on how pay for the privilege of getting on
a whitelist?
For one thing, I know they're never going to pay me.
For another, I'm not doing this because I hate AI or whatever. I'm
doing it for *survival*. The host that my wiki runs on *cannot*
process those thousands of useless dynamic page requests. If I were
to say "hey, pay me X dollars, and you can send as many page diff
requests as you want", I would be making a promise of services I cannot
deliver.
++.
The Crypto++ project got kicked off of GoDaddy hosting because of
virtual CPU usage. Our CPU usage would also affect co-located sites.
That's when GoDaddy stepped in and closed us down.
We tracked it back to bots hitting our wiki and trying to make
anonymous edits. The bots would try to make edits, and that would
spin-up that useless Wiki Editor from Wikimedia. The edit would
eventually fail (during Save) because the bot was not authenticated.
I seem to recall we were seeing between 3 and 7 edits per second from
bots.
The bots were also causing boatloads of OOM kills on our VM because
the wiki editor was so heavy-weight. We had to constantly repair our
SQL database because Linux was killing the MySQL instance.
Eventually we had to move to Hostinger hosting.
Jeff
AFAICT the only workable avenue is to limit web sites to static pages. Anything that requires more resources from the server needs to be firmly protected behind many layers of defenses. ?
At Hostinger we got a VM with more resources and a swap file for about
the same price. And we got network protections from bots at no
charge.
| Sysop: | Jacob Catayoc |
|---|---|
| Location: | Pasay City, Metro Manila, Philippines |
| Users: | 5 |
| Nodes: | 4 (0 / 4) |
| Uptime: | 19:03:57 |
| Calls: | 117 |
| Calls today: | 117 |
| Files: | 367 |
| D/L today: |
540 files (253M bytes) |
| Messages: | 70,845 |
| Posted today: | 26 |