Lightnews — Scholar-powered news

@martheballon.bsky.social

5 followers 1 following 7 posts

Posts Replies Media Videos

martheballon.bsky.social

@martheballon.bsky.social

Fig. 4: Accuracy declines as reasoning chains grow, this accuracy drop is significantly smaller in more proficient models. o3-mini (m) reasons more effectively than o1-mini. o3-mini (h) achieves accuracy gain over o3-mini, but uses more reasoning tokens across 𝗮𝗹𝗹 problems.

February 24, 2025 at 4:02 PM

martheballon.bsky.social

@martheballon.bsky.social

Fig. 3: o1-mini and o3-mini (m) have a similar token distribution. Higher performing models have a better ratio of correct to incorrect answers, even for high-token regions.

February 24, 2025 at 4:02 PM

martheballon.bsky.social

@martheballon.bsky.social

Fig. 2: Reasoning models allocate more reasoning tokens to disciplines that involve complex combinatorial reasoning. On average, token usage scales with problem complexity.

February 24, 2025 at 4:02 PM

martheballon.bsky.social

@martheballon.bsky.social

Fig. 1: gpt-4o lags behind reasoning models o1-mini and o3-mini on Omni-MATH benchmark. o3-mini (m) and o3-mini (h) surpass 50% on all math disciplines

February 24, 2025 at 4:02 PM

martheballon.bsky.social

@martheballon.bsky.social

LLMs are getting really good at reasoning, but mechanisms behind it are poorly understood. In our recent paper, we investigated SOTA models and found that 'Thinking harder ≠ thinking longer'!

Joint work with @andresalgaba.bsky.social @vincentginis.bsky.social

Insights of our research (A thread):

February 24, 2025 at 4:02 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news