Lightnews — Scholar-powered news

Florian Dorner

@flodorner.bsky.social

67 followers 280 following 34 posts

PhD student in CS @ ETHZ / MPI-IS

Theory of ML evaluation https://flodorner.github.io/

Posts Replies Media Videos

Florian Dorner

@flodorner.bsky.social

Does anyone have background on this plot, compared to the 32% performance for o3-mini-high with tool use claimed by OpenAI in January? #GPT5 #GPT-5

openai.com/index/introd...
openai.com/index/openai...

August 8, 2025 at 9:28 AM

Florian Dorner

@flodorner.bsky.social

April 24, 2025 at 1:36 AM

Florian Dorner

@flodorner.bsky.social

In two hours, Ricardo is giving a talk about our paper on training on the test task, and its confounding impacts on LLM benchmarking 📉📈. (Session 1B) arxiv.org/abs/2407.07890

April 24, 2025 at 1:36 AM

Florian Dorner

@flodorner.bsky.social

Starting to believe @natolambert.bsky.social's take that the o1 plots are misleading [1] (in the sense that OpenAI cannot fully control test compute at inference time). In particular, it seems like scaling up test compute might require extensive retraining.

[1] www.interconnects.ai/p/openais-o1...

January 21, 2025 at 10:57 AM

Florian Dorner

@flodorner.bsky.social

I meant Figure 2 in the R1 report looks like the left o1 plot if you squint hard enough (and consider the x-axis is linear rather than logarithmic)

January 20, 2025 at 3:55 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news