Epoch AI
banner
epochai.bsky.social
Epoch AI
@epochai.bsky.social
We are a research institute investigating the trajectory of AI for the benefit of society.

epoch.ai
AI data center buildouts already rival the Manhattan Project in scale, but there’s little public info about them.

So we spent the last few months reading legal permits, staring at satellite images, and scouring news sources.

Here’s what you need to know. 🧵
November 10, 2025 at 6:03 PM
How fast can you build a gigawatt-scale data center?

Some hyperscalers plan to do it in just 1-2 years from the start of construction.

If they succeed, we’ll see the first GW-scale data centers online in 2026, marking one of the fastest infrastructure build-outs in history. 🧵
November 10, 2025 at 5:40 PM
The Epoch Capabilities Index is a useful way to measure model capabilities, but what does a score of 150 actually mean?

One way to read our new capability index is by plotting the benchmark performance you expect to see, for a range of ECI scores 🧵
November 7, 2025 at 7:13 PM
Anthropic's recently-reported projection of $70B revenue in 2028 may be less than OpenAI's projection for the same year, but it would still represent historically fast growth.

bsky.app/profile/epo...
November 5, 2025 at 3:27 PM
Announcing our Frontier Data Centers Hub!

The world is about to see multiple 1 GW+ AI data centers.

We mapped their construction using satellite imagery, permits & public sources — releasing everything for free, including commissioned satellite images.

Highlights in thread!
November 4, 2025 at 7:16 PM
By stitching benchmarks together, the Epoch Capabilities Index allows us to compare frontier models to models with 100,000x less training compute.
November 3, 2025 at 8:59 PM
We looked at OSWorld, a popular evaluation of AI computer use capabilities.

Our findings: tasks are simple, many don't require GUIs, and success often hinges on interpreting ambiguous instructions. The benchmark is also not stable over time.

See thread for details!
November 3, 2025 at 8:16 PM
We found a bug in our benchmarking code: calls to GPT-5 with "high" reasoning were silently being set to "medium".

Corrected results: GPT-5 (high) scores slightly higher than GPT-5 (medium) on the benchmarks we run. They are also now tied on the Epoch Capabilities Index (ECI).
October 31, 2025 at 3:22 PM
We used our new capabilities index, the ECI, to measure the gap between open- and closed-weight models.

The result? This gap is smaller than previously estimated.

On average, it takes 3.5 months for an open-weight model to catch up with closed-source SOTA.
October 30, 2025 at 7:59 PM
Conventional wisdom in AI is that large scale pretraining needs to happen in massive contiguous datacenter campuses. But is this true?

Our research suggests that conducting 10 GW training runs across two dozen sites—linked by a network spanning thousands of km long—is feasible.
October 28, 2025 at 6:00 PM
We've launched a new tool to track AI progress!

The tool addresses one of the field's biggest challenges: benchmark saturation.

It's called the Epoch Capabilities Index (ECI) — here's what makes it different:
October 27, 2025 at 7:13 PM
Large language models can imitate reasoning steps and even verify formal proofs.

But mathematical physicist Svetlana Jitomirskaya argues they lack folklore knowledge: the implicit priors mathematicians build from experience.

Link to video in comments!
October 27, 2025 at 3:50 PM
Stanford mathematician Ravi Vakil expects AI’s impact on mathematics to come as a phase change, not a slow climb.

Every major shift in math has caught experts off guard, he says. This one will be no different, except that all our predictions will be even more wrong.

Link to video in comments!
October 23, 2025 at 1:53 PM
We evaluated Claude Haiku 4.5 on several benchmarks.

Even with reasoning disabled, Haiku 4.5 performs similarly or better than early lightweight reasoning models, like o1-mini.
October 17, 2025 at 5:49 PM
If you ran GPT-5 infinitely many times on FrontierMath—our extremely challenging math benchmark—would it eventually solve every problem?

Probably not. From what we can tell, it caps out below 50%.

What about throwing in *every* available model? Infinitely many times? 🧵
October 17, 2025 at 4:56 PM
OpenAI is experiencing one of the fastest revenue growth rates in corporate history, with annualized revenue rising 3x a year, from $2 billion at the end of 2023 to $13 billion by August 2025.
October 16, 2025 at 5:34 PM
One way bubbles pop: a technology doesn’t deliver value as quickly as investors bet it will.

In light of that, it’s notable that OpenAI is projecting historically unprecedented revenue growth — from $10B to $100B — over the next three years. 🧵
October 15, 2025 at 4:23 PM
A proof only 15 experts understand is less valuable than one any undergraduate can verify using a computer.

Mathematician Jesús De Loera on AI’s potential to democratize mathematical proof and the risks when systems hallucinate with perfect confidence.

Link to video in comments!
October 13, 2025 at 8:50 PM
New data insight: How does OpenAI allocate its compute?

OpenAI spent ~$7 billion on compute last year. Most of this went to R&D, meaning all research, experiments, and training.

Only a minority of this R&D compute went to the final training runs of released models.
October 10, 2025 at 6:19 PM
We manually evaluated three compute-intensive model settings on our extremely hard math benchmark. FrontierMath Tier 4: Battle Royale!

GPT-5 Pro set a new record (13%), edging out Gemini 2.5 Deep Think by a single problem (not statistically significant). Grok 4 Heavy lags. 🧵
October 10, 2025 at 4:26 PM
A healthy conversation about AI should be grounded in facts. Epoch’s datasets can help you track and understand the trajectory of AI.
As a nonprofit, our work is freely accessible for anyone to read, replicate, and build upon.
Our datasets:
October 10, 2025 at 3:19 PM
We recently wrote that GPT-5 is likely the first mainline GPT release to be trained on less compute than its predecessor.

How did we reach this conclusion, and what do we actually know about how GPT-5 was trained?
🧵
October 9, 2025 at 8:11 PM
AI capabilities have been steadily improving across a wide range of skills, and show no sign of slowing down in the near term. 🧵
October 9, 2025 at 6:48 PM
We evaluated Gemini 2.5 Deep Think on FrontierMath. There is no API, so we ran it manually. The results: a new record!

We also conducted a more holistic evaluation of its math capabilities. 🧵
October 9, 2025 at 5:32 PM
USC mathematician Greta Panova wrote a math problem so difficult that today’s most advanced AI models don’t know where to begin.

She thinks that when AI finally can, it will have crossed a threshold in general human-level reasoning.

Link to video in comments!
October 9, 2025 at 1:25 PM