Our findings give new insights into the relationship between model capability and reasoning length; implications for efficiency, scaling, and evaluation methodologies of future models.
February 24, 2025 at 4:02 PM
Our findings give new insights into the relationship between model capability and reasoning length; implications for efficiency, scaling, and evaluation methodologies of future models.
Fig. 4: Accuracy declines as reasoning chains grow, this accuracy drop is significantly smaller in more proficient models. o3-mini (m) reasons more effectively than o1-mini. o3-mini (h) achieves accuracy gain over o3-mini, but uses more reasoning tokens across 𝗮𝗹𝗹 problems.
February 24, 2025 at 4:02 PM
Fig. 4: Accuracy declines as reasoning chains grow, this accuracy drop is significantly smaller in more proficient models. o3-mini (m) reasons more effectively than o1-mini. o3-mini (h) achieves accuracy gain over o3-mini, but uses more reasoning tokens across 𝗮𝗹𝗹 problems.
Fig. 3: o1-mini and o3-mini (m) have a similar token distribution. Higher performing models have a better ratio of correct to incorrect answers, even for high-token regions.
February 24, 2025 at 4:02 PM
Fig. 3: o1-mini and o3-mini (m) have a similar token distribution. Higher performing models have a better ratio of correct to incorrect answers, even for high-token regions.
Fig. 2: Reasoning models allocate more reasoning tokens to disciplines that involve complex combinatorial reasoning. On average, token usage scales with problem complexity.
February 24, 2025 at 4:02 PM
Fig. 2: Reasoning models allocate more reasoning tokens to disciplines that involve complex combinatorial reasoning. On average, token usage scales with problem complexity.
Fig. 1: gpt-4o lags behind reasoning models o1-mini and o3-mini on Omni-MATH benchmark. o3-mini (m) and o3-mini (h) surpass 50% on all math disciplines
February 24, 2025 at 4:02 PM
Fig. 1: gpt-4o lags behind reasoning models o1-mini and o3-mini on Omni-MATH benchmark. o3-mini (m) and o3-mini (h) surpass 50% on all math disciplines