Another quote from the February article:
As long as an LLM is accessible to the public, no foolproof
technical barrier prevents a determined actor from doing the same
thing to someone else?s model over time (though rate-limiting
helps), which is exactly what Google says happened to Gemini.
It?s not enough that the creators of the AI models have scraped
information from millions of websites to build their systems, now they
have to worry about competitors short-circuiting the process by
extracting information wholesale from their models, by making hundreds
or thousands of requests and combining the results. This is called a ?distillation attack?. Some AI service providers try to say this is
against their terms and conditions, but how do you define what is and
isn?t allowed, exactly?
<https://arstechnica.com/ai/2026/02/attackers-prompted-gemini-over-100000-times-while-trying-to-clone-it-google-says/>:
In March 2023, shortly after Meta?s LLaMA model weights leaked
online, Stanford University researchers built a model called
Alpaca by fine-tuning LLaMA on 52,000 outputs generated by
OpenAI?s GPT-3.5. The total cost was about $600. The result
behaved so much like ChatGPT that it raised immediate questions
about whether any AI model?s capabilities could be protected once
it was accessible through an API.
A big worry in the US now is that China is doing this to train its AI
models <https://arstechnica.com/tech-policy/2026/04/us-accuses-china-of-industrial-scale-ai-theft-china-says-its-slander/>.
And of course, as soon as it is seen as a ?national security? issue,
they start to lobby the politicians to pass laws against it. Imagine
that: making too much use of the service, or prompting it according to certain forbidden patterns, could now become criminal violations of
laws such as the Economic Espionage Act and the Computer Fraud and
Abuse Act.
Another quote from the February article:
As long as an LLM is accessible to the public, no foolproof
technical barrier prevents a determined actor from doing the same
thing to someone else?s model over time (though rate-limiting
helps), which is exactly what Google says happened to Gemini.
It's not enough that the creators of the AI models have scraped
information from millions of websites to build their systems, now they
have to worry about competitors short-circuiting the process by
extracting information wholesale from their models, by making hundreds
or thousands of requests and combining the results. This is called a "distillation attack".
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
It's not enough that the creators of the AI models have scraped
information from millions of websites to build their systems, now they
have to worry about competitors short-circuiting the process by
extracting information wholesale from their models, by making hundreds
or thousands of requests and combining the results. This is called a "distillation attack".
Great! Let the LLMs scrape each other instead of each one scraping
my websites every minute. That'd be a more sane way for it to work,
except all the investors hoping to buy into a future AI monopoly
would run for the hills.
| Sysop: | Jacob Catayoc |
|---|---|
| Location: | Pasay City, Metro Manila, Philippines |
| Users: | 5 |
| Nodes: | 4 (0 / 4) |
| Uptime: | 493846:36:36 |
| Calls: | 146 |
| Files: | 547 |
| D/L today: |
6 files (97K bytes) |
| Messages: | 76,794 |