Epoch AI
banner
epochai.bsky.social
Epoch AI
@epochai.bsky.social
We are a research institute investigating the trajectory of AI for the benefit of society.

epoch.ai
This power runs IT equipment on “server racks” with a small area of 0.5 m^2. But each rack uses enough power for 100 homes!

This means a huge amount of heat in a small space. So you can’t cool these chips with fans — you need liquid coolants to efficiently soak up the heat.
November 10, 2025 at 6:03 PM
AI data center buildouts already rival the Manhattan Project in scale, but there’s little public info about them.

So we spent the last few months reading legal permits, staring at satellite images, and scouring news sources.

Here’s what you need to know. 🧵
November 10, 2025 at 6:03 PM
How fast can you build a gigawatt-scale data center?

Some hyperscalers plan to do it in just 1-2 years from the start of construction.

If they succeed, we’ll see the first GW-scale data centers online in 2026, marking one of the fastest infrastructure build-outs in history. 🧵
November 10, 2025 at 5:40 PM
The Epoch Capabilities Index is a useful way to measure model capabilities, but what does a score of 150 actually mean?

One way to read our new capability index is by plotting the benchmark performance you expect to see, for a range of ECI scores 🧵
November 7, 2025 at 7:13 PM
Anthropic's recently-reported projection of $70B revenue in 2028 may be less than OpenAI's projection for the same year, but it would still represent historically fast growth.

bsky.app/profile/epo...
November 5, 2025 at 3:27 PM
AI data centers require massive capital investment, typically $29B per gigawatt of total facility power.

The most expensive data center we know about is Microsoft Fairwater, projected to exceed $100 billion in total capital cost upon completion.
November 4, 2025 at 7:16 PM
The largest 2026 facility (xAI Colossus 2) will have the compute equivalent of 1.4M H100 GPUs.

Even larger data centers are coming: Meta Hyperion and Microsoft Fairwater will each have 5M H100e when they reach full capacity in late 2027 to early 2028.
November 4, 2025 at 7:16 PM
Several data centers will soon demand 1 GW of power, starting early next year:

- Anthropic–Amazon New Carlisle (January)
- xAI Colossus 2 (February)
- Microsoft Fayetteville (March, borderline 1GW)
- Meta Prometheus (May)
- OpenAI Stargate Abilene (July)
November 4, 2025 at 7:16 PM
Announcing our Frontier Data Centers Hub!

The world is about to see multiple 1 GW+ AI data centers.

We mapped their construction using satellite imagery, permits & public sources — releasing everything for free, including commissioned satellite images.

Highlights in thread!
November 4, 2025 at 7:16 PM
By stitching benchmarks together, the Epoch Capabilities Index allows us to compare frontier models to models with 100,000x less training compute.
November 3, 2025 at 8:59 PM
OSWorld is about computer use, but many tasks require little use of graphical user interfaces.

About 15% can be solved with only the terminal and a further 30% can rely heavily on Python scripts.

We even found cases of models downloading packages to manipulate spreadsheets.
November 3, 2025 at 8:16 PM
Most tasks are realistic but relatively simple, requiring fewer than ten steps (clicks, text inputs, etc.).

These tasks take humans only a few minutes to complete.
November 3, 2025 at 8:16 PM
OSWorld consists of 361 tasks sourced from forums and tutorials.

Models get an Ubuntu VM and task instructions, and write code to interact with the mouse and keyboard.

Here is one task's starting state. Instructions: "Make a duplicate of the last two slides for me, please."
November 3, 2025 at 8:16 PM
We looked at OSWorld, a popular evaluation of AI computer use capabilities.

Our findings: tasks are simple, many don't require GUIs, and success often hinges on interpreting ambiguous instructions. The benchmark is also not stable over time.

See thread for details!
November 3, 2025 at 8:16 PM
We found a bug in our benchmarking code: calls to GPT-5 with "high" reasoning were silently being set to "medium".

Corrected results: GPT-5 (high) scores slightly higher than GPT-5 (medium) on the benchmarks we run. They are also now tied on the Epoch Capabilities Index (ECI).
October 31, 2025 at 3:22 PM
We used our new capabilities index, the ECI, to measure the gap between open- and closed-weight models.

The result? This gap is smaller than previously estimated.

On average, it takes 3.5 months for an open-weight model to catch up with closed-source SOTA.
October 30, 2025 at 7:59 PM
Evidence for this is Microsoft’s Fairwater datacenter in Wisconsin, a planned multiple GW site that will allegedly become “part of a global network of Azure AI datacenters”— meant to “enable large-scale distributed training across multiple, geographically diverse Azure regions”.
October 28, 2025 at 6:00 PM
However, distributed clusters have many downsides:
- more complex permitting processes
- additional engineering to manage a long-range network connection and reliability
- constraints on communication-heavy paradigms
October 28, 2025 at 6:00 PM
However, the main cost of fiber deployment is installation, so increasing bandwidth is cheap compared to the overall cost of datacenter construction.
October 28, 2025 at 6:00 PM
For synchronization, we will consider a bidirectional ring all-reduce algorithm. This allows us to complete a synchronization with a single round trip around the network. The synchronization time is then determine by the point-to-point network bandwidth, and bounded by the network latency.
October 28, 2025 at 6:00 PM
Such a setup will only be practical if the time spent synchronizing is sufficiently small compared to the time it takes to process each batch. The batch processing time is in turn determined by the critical batch size, which has been shown to scale with the dataset size.
October 28, 2025 at 6:00 PM
This strategy has been tested before by NVIDIA to train a Nemotron-4 340B model between two datacenters 1,000km apart.
October 28, 2025 at 6:00 PM
Conventional wisdom in AI is that large scale pretraining needs to happen in massive contiguous datacenter campuses. But is this true?

Our research suggests that conducting 10 GW training runs across two dozen sites—linked by a network spanning thousands of km long—is feasible.
October 28, 2025 at 6:00 PM
We think ECI is a better indicator of holistic AI capability than any single benchmark.

It currently covers models from 2023 on, and it allows us to track trends in capabilities as they emerge.
October 27, 2025 at 7:13 PM
Individual AI benchmarks saturate quickly—sometimes within months. This makes it hard to track long-term trends.

However, by combining scores from different benchmarks, we created a single scale that captures the full range of model performance over time.
October 27, 2025 at 7:13 PM