Author — The Precipice: Existential Risk and the Future of Humanity.
tobyord.com
Overall the inference scaling produced 82%, 63%, and 92% of the total performance gains on the different benchmarks.
12/
Overall the inference scaling produced 82%, 63%, and 92% of the total performance gains on the different benchmarks.
12/
The same is true for the other benchmarks I examined. Here are the raw scatterplots:
11/
The same is true for the other benchmarks I examined. Here are the raw scatterplots:
11/
• the RL boost taking the base model to the trend line
• the inference-scaling boost taking it to the top of the trend
10/
• the RL boost taking the base model to the trend line
• the inference-scaling boost taking it to the top of the trend
10/
8/
8/
6/
6/
5/
5/
🧵 @givingwhatwecan.bsky.social
🧵 @givingwhatwecan.bsky.social
14/
14/
13/
13/
10/
10/
2/
2/
6/
6/
5/
5/
4/
4/
3/
3/
2/
2/
3/n
3/n
2/n
2/n