Julio Gonzalo
juliogonzalo.bsky.social
Julio Gonzalo
@juliogonzalo.bsky.social
Researcher in Natural Language Processing, Artificial Intelligence, Information Retrieval. PI of nlp.uned.es.
Reposted by Julio Gonzalo
Such a simple and ingenious method to isolate reasoning from memorization in LLMs.

Performance of reasoning models drop significantly evaluated based on multiple choice questions in which the correct answer was replaced with 'None of the others'

arxiv.org/abs/2502.12896
None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks
In LLM evaluations, reasoning is often distinguished from recall/memorization by performing numerical variations to math-oriented questions. Here we introduce a general variation method for multiple-c...
arxiv.org
February 21, 2025 at 8:11 PM