Lightnews — Scholar-powered news

Andreas Opedal

@andreasopedal.bsky.social

440 followers 210 following 8 posts

PhD student at ETH Zurich & MPI-IS in NLP & ML
Language, Reasoning, and Cognition
https://opedal.github.io

Posts Replies Media Videos

Andreas Opedal

@andreasopedal.bsky.social

See the paper for more details and experiments: arxiv.org/pdf/2410.13502
Or check out the codebase to generate your own problems: github.com/eth-lre/math...

March 14, 2025 at 4:14 PM

Andreas Opedal

@andreasopedal.bsky.social

All models are sensitive to a simple change in sentence ordering, where we take one sentence and move it to the beginning. We also find that the problem is easiest for LLMs if the sentence is moved from near the beginning or end, rather than from the middle!

March 14, 2025 at 4:14 PM

Andreas Opedal

@andreasopedal.bsky.social

OpenAI’s o1 and DeepSeek-R1 are certainly impressive. However, when we permuted the ordering of the sentences their performance went down to 5% and 11% respectively (with the token limit set to 25,000 as recommended by OpenAI).

March 14, 2025 at 4:14 PM

Andreas Opedal

@andreasopedal.bsky.social

Here are the results for what we call “nonlinear” problems. Solving them requires keeping intermediate results in memory for subsequent steps before they can be used for further deduction. The most complex problems are pretty hard for all models, but they are still able to solve some of them!

March 14, 2025 at 4:14 PM

Andreas Opedal

@andreasopedal.bsky.social

We apply MathGAP to perform a systematic analysis on whether LLMs can use simple examples in context to solve more complex ones at inference. Generalization to proof width turns out to be harder than to proof depth, but we see a steady decrease in performance as proofs get both deeper and wider 💡

March 14, 2025 at 4:14 PM

Andreas Opedal

@andreasopedal.bsky.social

With our proof system we can generate new MWPs that adhere to the structure of proof trees, as well as ground-truth CoT traces! From the proof trees we then characterize the complexity of reasoning in several ways, e.g., depth, width, shape, and ordering of nodes (i.e., sentences).

March 14, 2025 at 4:14 PM

Andreas Opedal

@andreasopedal.bsky.social

Our work builds on a simple observation: Math word problems (MWPs) are deductive reasoning problems, so solving them can be thought of as applying inference rules. We can thus view solution/reasoning traces as proof trees, the structure of which tells us how hard/complex the problem is to solve.

March 14, 2025 at 4:14 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news