• Venting about forums.debian.net

    From Stefan Monnier@3:633/10 to All on Monday, January 19, 2026 23:50:01
    Is forums.debian.net linked to Debian or is it some kind of scam using "debian.net" to confuse the user into thinking it's legitimate?

    I just failed their "test for human" when trying to register, and the
    only reason I tried to register was to see some replies which are hidden
    behind an obnoxious "You must be a registered member and logged in to
    view the replies in this topic".

    That seems rather contrary to Debian's general philosophy.


    - Stefan

    --- PyGate Linux v1.5.2
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Jeffrey Walton@3:633/10 to All on Tuesday, January 20, 2026 00:00:01
    On Mon, Jan 19, 2026 at 5:40?PM Stefan Monnier <monnier@iro.umontre
    al.ca> wrote:

    Is forums.debian.net linked to Debian or is it some kind of scam using "debian.net" to confuse the user into thinking it's legitimate?

    I just failed their "test for human" when trying to register, and the
    only reason I tried to register was to see some replies which are hidden behind an obnoxious "You must be a registered member and logged in to
    view the replies in this topic".

    That seems rather contrary to Debian's general philosophy.

    forums.debian.net is administratively controlled by the Debian
    project; see <https://wiki.debian.org/DebianNetDomains>. I'm guessing
    Debian is also the technical contact for the domain and site. I can
    only say "guess" because the registrar redacted both administrative
    and technical contacts for the domain.

    Jeff

    --- PyGate Linux v1.5.2
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Stefan Monnier@3:633/10 to All on Tuesday, January 20, 2026 05:50:01
    Is forums.debian.net linked to Debian or is it some kind of scam using
    "debian.net" to confuse the user into thinking it's legitimate?
    forums.debian.net is administratively controlled by the Debian
    project; see <https://wiki.debian.org/DebianNetDomains>.

    Having spent more time around it, and recovered from my frustration,
    I can confirm that it seems very legitimate.

    [ And I finally managed to register, after finding the fault in my
    human reasoning (I stupidly hadn't noticed that there were more fields
    to fill beyond the bottom of the window. I guess a bot wouldn't
    have made that mistake ?). ]


    - Stefan

    --- PyGate Linux v1.5.2
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From tomas@3:633/10 to All on Tuesday, January 20, 2026 07:50:01
    On Mon, Jan 19, 2026 at 11:40:06PM -0500, Stefan Monnier wrote:
    Is forums.debian.net linked to Debian or is it some kind of scam using
    "debian.net" to confuse the user into thinking it's legitimate?
    forums.debian.net is administratively controlled by the Debian
    project; see <https://wiki.debian.org/DebianNetDomains>.

    Having spent more time around it, and recovered from my frustration,
    I can confirm that it seems very legitimate.
    Thank the LLM craze for that. Nearly every admin of an open site I
    know has seen its site trampled to the ground by those harvesters.
    Primitive accumulation, Marx's original sin of simple robbery, "had
    eventually to be repeated, lest the motor of capital accumulation
    suddenly die down" (Hannah Arendt, as quoted by Shoshana Zuboff.
    Arendt, in turn attributes this idea to Rosa Luxemburg).
    This is why we can't have nice things.
    Cheers
    --
    tom s


    --- PyGate Linux v1.5.2
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Henrik Ahlgren@3:633/10 to All on Tuesday, January 20, 2026 10:10:01
    Stefan Monnier <monnier@iro.umontreal.ca> writes:

    [ And I finally managed to register, after finding the fault in my
    human reasoning (I stupidly hadn't noticed that there were more fields
    to fill beyond the bottom of the window. I guess a bot wouldn't
    have made that mistake ?). ]

    Hmm, I don't get any captcha in the registration form. Or does it occur
    only after submitting the form (I did not go that far)? But the site is
    behind the very popular Anubis[1] that "weights the soul of your
    connection" - however I believe "soul weighting" is only possible if you
    have Javascript enabled. Perhaps you get worse treatment without JS.

    I fully agree that restricting freely accessible information with a registration requirement goes against the spirit of Debian. I also hate captchas with passion, but I suppose you need to compromise with LLM
    robots going wild.

    [1] https://github.com/TecharoHQ/anubis

    --- PyGate Linux v1.5.2
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Svetlana Tkachenko@3:633/10 to All on Tuesday, January 20, 2026 11:50:01
    but I suppose you need to compromise with LLM
    robots going wild.

    Are they not required to follow do_not_track http headers or robots.txt ? If LLM robots do not obey these instructions, they should be probably reported to their hosting provider.

    S

    --- PyGate Linux v1.5.2
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From tomas@3:633/10 to All on Tuesday, January 20, 2026 12:10:02
    On Tue, Jan 20, 2026 at 09:47:34PM +1100, Svetlana Tkachenko wrote:
    but I suppose you need to compromise with LLM
    robots going wild.

    Are they not required to follow do_not_track http headers or robots.txt ? If LLM robots do not obey these instructions, they should be probably reported to their hosting provider.
    They don't always care. Their hosting provider isn't always in position
    to care.
    There's enough betting money in this pool to motivate actors to break
    and/or creatively bend the rules.
    Recommended reading:
    https://medium.com/@kolla.gopi/the-cloudflare-perplexity-standoff-why-robots-txt-is-broken-for-the-ai-era-1b9d309bdc2b
    Cheers
    --
    t


    --- PyGate Linux v1.5.2
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Steve McIntyre@3:633/10 to All on Tuesday, January 20, 2026 13:10:01
    svetlana@members.fsf.org wrote:
    but I suppose you need to compromise with LLM
    robots going wild.

    Are they not required to follow do_not_track http headers or robots.txt ? If LLM robots do not obey these
    instructions, they should be probably reported to their hosting provider.

    Hahahaha. No.

    The current crop of LLM morons do not care at all about following
    accepted rules or norms. They just want to grab all the data, screw
    everybody else. They're ignoring robots.txt, so service admins started
    blocking netblocks some time ago.

    Now we have the LLM morons using botnets to evade those blocks. We
    have massive amounts of downloads coming from random residential IPs
    all over the world, carefully spread out to make it more difficult to
    block them.

    These morons are why we can't have nice things.

    --
    Steve McIntyre, Cambridge, UK. steve@einval.com Can't keep my eyes from the circling sky,
    Tongue-tied & twisted, Just an earth-bound misfit, I...

    --- PyGate Linux v1.5.2
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Greg Wooledge@3:633/10 to All on Tuesday, January 20, 2026 14:20:01
    On Tue, Jan 20, 2026 at 12:03:38 +0000, Steve McIntyre wrote:
    svetlana@members.fsf.org wrote:
    Are they not required to follow do_not_track http headers or robots.txt ? If LLM robots do not obey these
    instructions, they should be probably reported to their hosting provider.

    Hahahaha. No.

    The current crop of LLM morons do not care at all about following
    accepted rules or norms. They just want to grab all the data, screw
    everybody else. They're ignoring robots.txt, so service admins started blocking netblocks some time ago.

    Now we have the LLM morons using botnets to evade those blocks. We
    have massive amounts of downloads coming from random residential IPs
    all over the world, carefully spread out to make it more difficult to
    block them.

    These morons are why we can't have nice things.

    I can confirm this. My own wiki was slammed really hard by this,
    resulting in my having to take substantial actions to limit the
    availability of some "pages".

    The issue isn't even that the LLM bots are harvesting every wiki page.
    If it were only that, I wouldn't mind. The first problem is wikis allow
    you to request the difference between any two revision of a page. So,
    let's say a page has 100 revisions. You can request the diff between
    revision 11 and revision 37. Or the diff between revision 14 and
    revision 69. And so on, and so on.

    What happens is the bots request *every single combination* of these
    diffs, each one from a random IP address, often (but not always) with
    a falsified user-agent header.

    I've blocked all the requests that give a robotic user-agent, but there's really nothing I can do about the ones that masquerade as Firefox or
    whatever, unless I need to take it a step further and block all requests
    that ask for a diff. I haven't had to do that yet. Maybe the LLM herders
    have finally put *some* thought into what they're doing and reduced the stupidity level...? Dunno.

    Compounding that, MoinMoin has some sort of bizarre calendar thing
    that I've never used and don't really understand. But apparently
    there's a potential page for every single date in a range that spans
    multiple centuries. I've deleted all of those pages *multiple* times,
    but spam bots got those pages into their "try to edit" caches, so they
    kept coming back. Meanwhile, the LLM harvesters got those pages into
    their "try to fetch" caches, so they would keep requesting them, even
    though the pages didn't exist any longer.

    So, another action I had to take was to block every request that tries
    to hit one of those calendar pages, at the web server level, before it
    could even make it to the wiki engine.

    So, I've got this:

    # less /etc/nginx/conf.d/badclient.conf
    map $http_user_agent $badclient {
    default 0;

    "~BLEXBot/" 1;
    "~ClaudeBot/" 1;
    "~DotBot/" 1;
    "~facebookexternalhit/" 1;
    "~PetalBot;" 1;
    "~SemrushBot" 1;
    "~Thinkbot/" 1;
    "~Twitterbot/" 1;

    "~BadClient/" 2;
    }

    map $request_uri $badrequest {
    default 0;
    "~/MonthCalendar/" 1;
    "~/MonthCalendar\?" 1;
    "~/SampleUser/" 1;
    "~/WikiCourse/" 1;
    "~/WikiKurs/" 1;
    }

    And this:

    # less /etc/nginx/sites-enabled/mywiki.wooledge.org
    server {
    listen 80;
    listen 443 ssl;
    server_name mywiki.wooledge.org;

    if ($badclient) {
    return 403;
    }
    if ($badrequest) {
    return 403;
    }
    ...
    }

    So far, these changes (combined with the brute force removal of the MonthCalendar et al. pages) have been sufficient.

    --- PyGate Linux v1.5.2
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Davidson@3:633/10 to All on Tuesday, January 20, 2026 21:40:01
    On Tue, 20 Jan 2026, Greg Wooledge wrote:

    On Tue, Jan 20, 2026 at 12:03:38 +0000, Steve McIntyre wrote:
    svetlana@members.fsf.org wrote:
    Are they not required to follow do_not_track http headers or robots.txt ? If LLM robots do not obey these
    instructions, they should be probably reported to their hosting provider. >>
    Hahahaha. No.

    The current crop of LLM morons do not care at all about following
    accepted rules or norms. They just want to grab all the data, screw
    everybody else. They're ignoring robots.txt, so service admins started
    blocking netblocks some time ago.

    Now we have the LLM morons using botnets to evade those blocks. We
    have massive amounts of downloads coming from random residential IPs
    all over the world, carefully spread out to make it more difficult to
    block them.

    These morons are why we can't have nice things.

    I can confirm this. My own wiki was slammed really hard by this,
    resulting in my having to take substantial actions to limit the
    availability of some "pages".

    The issue isn't even that the LLM bots are harvesting every wiki page.
    If it were only that, I wouldn't mind. The first problem is wikis allow
    you to request the difference between any two revision of a page. So,
    let's say a page has 100 revisions. You can request the diff between revision 11 and revision 37. Or the diff between revision 14 and
    revision 69. And so on, and so on.

    What happens is the bots request *every single combination* of these
    diffs, each one from a random IP address, often (but not always) with
    a falsified user-agent header.

    I've blocked all the requests that give a robotic user-agent, but there's really nothing I can do about the ones that masquerade as Firefox or whatever, unless I need to take it a step further and block all requests
    that ask for a diff. I haven't had to do that yet. Maybe the LLM herders have finally put *some* thought into what they're doing and reduced the stupidity level...? Dunno.

    Compounding that, MoinMoin has some sort of bizarre calendar thing
    that I've never used and don't really understand. But apparently
    there's a potential page for every single date in a range that spans
    multiple centuries. I've deleted all of those pages *multiple* times,
    but spam bots got those pages into their "try to edit" caches, so they
    kept coming back. Meanwhile, the LLM harvesters got those pages into
    their "try to fetch" caches, so they would keep requesting them, even
    though the pages didn't exist any longer.

    So, another action I had to take was to block every request that tries
    to hit one of those calendar pages, at the web server level, before it
    could even make it to the wiki engine.

    So, I've got this:

    # less /etc/nginx/conf.d/badclient.conf
    map $http_user_agent $badclient {
    default 0;

    "~BLEXBot/" 1;
    "~ClaudeBot/" 1;
    "~DotBot/" 1;
    "~facebookexternalhit/" 1;
    "~PetalBot;" 1;
    "~SemrushBot" 1;
    "~Thinkbot/" 1;
    "~Twitterbot/" 1;

    "~BadClient/" 2;
    }

    map $request_uri $badrequest {
    default 0;
    "~/MonthCalendar/" 1;
    "~/MonthCalendar\?" 1;
    "~/SampleUser/" 1;
    "~/WikiCourse/" 1;
    "~/WikiKurs/" 1;
    }

    And this:

    # less /etc/nginx/sites-enabled/mywiki.wooledge.org
    server {
    listen 80;
    listen 443 ssl;
    server_name mywiki.wooledge.org;

    if ($badclient) {
    return 403;
    [rest snipped]

    Why not

    402 Payment Required

    instead, with instructions on how pay for the privilege of getting on
    a whitelist?

    --- PyGate Linux v1.5.2
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Greg Wooledge@3:633/10 to All on Tuesday, January 20, 2026 22:00:01
    On Tue, Jan 20, 2026 at 15:35:10 -0500, Davidson wrote:
    Why not

    402 Payment Required

    instead, with instructions on how pay for the privilege of getting on
    a whitelist?

    For one thing, I know they're never going to pay me.

    For another, I'm not doing this because I hate AI or whatever. I'm
    doing it for *survival*. The host that my wiki runs on *cannot*
    process those thousands of useless dynamic page requests. If I were
    to say "hey, pay me X dollars, and you can send as many page diff
    requests as you want", I would be making a promise of services I cannot deliver.

    --- PyGate Linux v1.5.2
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Stefan Monnier@3:633/10 to All on Wednesday, January 21, 2026 01:10:01
    We tracked it back to bots hitting our wiki and trying to make
    anonymous edits. The bots would try to make edits, and that would
    spin-up that useless Wiki Editor from Wikimedia. The edit would
    eventually fail (during Save) because the bot was not authenticated.
    I seem to recall we were seeing between 3 and 7 edits per second from
    bots.

    AFAICT the only workable avenue is to limit web sites to static pages.
    Anything that requires more resources from the server needs to be firmly protected behind many layers of defenses. ?


    - Stefan

    --- PyGate Linux v1.5.2
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Bigsy Bohr@3:633/10 to All on Wednesday, January 21, 2026 15:40:01
    On 2026-01-20, Jeffrey Walton <noloader@gmail.com> wrote:
    On Tue, Jan 20, 2026 at 3:53?PM Greg Wooledge <greg@wooledge.org> wrote:

    On Tue, Jan 20, 2026 at 15:35:10 -0500, Davidson wrote:
    Why not

    402 Payment Required

    instead, with instructions on how pay for the privilege of getting on
    a whitelist?

    For one thing, I know they're never going to pay me.

    For another, I'm not doing this because I hate AI or whatever. I'm
    doing it for *survival*. The host that my wiki runs on *cannot*
    process those thousands of useless dynamic page requests. If I were
    to say "hey, pay me X dollars, and you can send as many page diff
    requests as you want", I would be making a promise of services I cannot
    deliver.

    ++.

    The Crypto++ project got kicked off of GoDaddy hosting because of
    virtual CPU usage. Our CPU usage would also affect co-located sites.
    That's when GoDaddy stepped in and closed us down.

    We tracked it back to bots hitting our wiki and trying to make
    anonymous edits. The bots would try to make edits, and that would
    spin-up that useless Wiki Editor from Wikimedia. The edit would
    eventually fail (during Save) because the bot was not authenticated.
    I seem to recall we were seeing between 3 and 7 edits per second from
    bots.

    The bots were also causing boatloads of OOM kills on our VM because
    the wiki editor was so heavy-weight. We had to constantly repair our
    SQL database because Linux was killing the MySQL instance.

    Eventually we had to move to Hostinger hosting.

    How did that solve the problem (I'm ignorant)?

    The thing is with these big or little but soon to big and overvalued on
    the Dow Jones or whatever it is companies, you can't just call them on the phone and ask them to be reasonable. I'm not even sure they control or understand exactly what these robots are doing. I think they're already
    out of control and on the loose, as it were.

    Jeff



    --- PyGate Linux v1.5.2
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Stefan Monnier@3:633/10 to All on Thursday, January 22, 2026 00:50:02
    Stefan Monnier [2026-01-20 19:06:28] wrote:
    AFAICT the only workable avenue is to limit web sites to static pages. Anything that requires more resources from the server needs to be firmly protected behind many layers of defenses. ?

    Hmm... and if forums.debian.net follows the above principle and caches
    the first page of every topic for static delivery, I guess the result
    would be that this first page would end with something like "You must be
    a registered member and logged in to view the replies in this topic". ?


    - Stefan

    --- PyGate Linux v1.5.2
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Svetlana Tkachenko@3:633/10 to All on Thursday, January 22, 2026 01:50:01
    Hi Jeff

    Jeffrey Walton wrote:
    At Hostinger we got a VM with more resources and a swap file for about
    the same price. And we got network protections from bots at no
    charge.

    Via cloudflare or some router magic? How does the network protection work?

    Sveta

    --- PyGate Linux v1.5.2
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)