Stuart Gray
banner
sgray.bsky.social
Stuart Gray
@sgray.bsky.social
He/Him. AI Wrangler. Web Geek. F1 Fan. All views my own.

🤖 AI, LLMs, GenAI, NLP
🐍 Python Dev
🚀 Indie Hacker
🎮 Game Dev, ProcGen, Unity, C#
🏎️ F1 Fan
🇬🇧 UK Based

🦣 mastodonapp.uk/@StuartGray
✖️ x.com/StuartGray (inactive)
On this specific one, yes, it’s pure idiocy.

Procedural Generation has nothing to do with AI, and has been around for decades with no complaints.

These people are now suddenly against Fractals of all things!?

It’s been use in music and big commercial games since at least the early 90s.
November 13, 2025 at 9:16 AM
Reposted by Stuart Gray
I'll give you a US example:.

The main age-verification lobbyist — a man who largely believes porn should be outlawed — admitted the state-level bills he pushed for won't work and were really a predicate for federal action.

He wants the DOJ to seize domains.
November 12, 2025 at 12:19 AM
Not sure Elder Scrolls is a fair comparison for a few reasons:

* Skyrim had an unusually long life (and still going!), partly thanks to 3rd party mods
* So much so that the Bethesda head grew to massively resent what he viewed as “lost revenue”!
* There was/is also ESO Online
November 11, 2025 at 7:24 PM
Reposted by Stuart Gray
I'm obliged for the nod.

The "Penrose Effect" seems to be a real thing - hypothesised in the 1930s and re-tested in the last decade or so:

Where you reduce your inpatient psychiatric provision, you'll see a correlated rise within 10yrs in prisons of seriously mentally ill prisoners.
November 9, 2025 at 11:37 AM
Reposted by Stuart Gray
... which results in many (closed and open) models showing a similar performance bias towards likely leaked samples
We split MMLU in two parts (leaked/clean) and show that almost all models tend to perform better on leaked samples
November 7, 2025 at 9:11 PM
Reposted by Stuart Gray
This contamination is not intentional: we identified websites that reframed splits of MMLU as user-friendly quizzes
These websites can then be found in CommonCrawl dumps that are generally used for pretraining data curation...
November 7, 2025 at 9:11 PM
Reposted by Stuart Gray
We used the great Infinigram from Jiacheng Liu and found numerous hints of test set leakage in DCLM, which is used in OLMo-2

For instance, the fraction of MMLU questions that are leaked in pretraining had gone from ~1% to 24% between OLMo-1 and 2 😬
November 7, 2025 at 9:11 PM
Reposted by Stuart Gray
These all sound like edge cases. But as we are seeing in the US right now, edge cases have a horrible habit of becoming reality. It's not about "can you trust today's government" - it's about can you trust a completely unknown government in the future?
November 6, 2025 at 12:49 PM