gavin leech
gleech.org
gavin leech
@gleech.org
context maximizer

https://gleech.org/
November 18, 2025 at 5:45 PM
What to do?

1. scrape the Archive for Google queries and audit decreasing frontpage quality

2. start longitudinal collection on Google quality (and the LLMs now before in-chat ads arrive)

3. More generally do "prospective science", collect data now about things we think will go down the toilet.
November 18, 2025 at 5:34 PM
Google search sure seems to have gotten worse. But we failed as a civilisation to track this and now we can't quantify it.
November 18, 2025 at 5:34 PM
November 18, 2025 at 2:34 PM
AI editing: a test
www.gleech.org
November 8, 2025 at 11:57 AM
Abusing "inference"
www.gleech.org
November 8, 2025 at 11:57 AM
The METR eval is worth reading throughout - they anticipated most of my objections

metr.github.io/autonomy-eva...
Details about METR’s evaluation of OpenAI GPT-5
Resources for testing dangerous autonomous capabilities in frontier models
metr.github.io
August 8, 2025 at 2:06 PM
Refs:

epoch.ai/frontiermath

If you assume GPT-5 fails all 23 excluded SWE-Bench problems, then Claude 4.0 > GPT-5
x.com/gneubig/stat...

other coding
x.com/eli_lifland/...

aider.chat/docs/leaderboa
FrontierMath
FrontierMath is a benchmark of hundreds of unpublished and extremely challenging math problems to help us to understand the limits of artificial intelligence.
epoch.ai
August 8, 2025 at 2:06 PM