Lightnews — Scholar-powered news

Mario Sanz

@msanz.bsky.social

450 followers 200 following 4 posts

PhD student in #NLProc

Posts Replies Media Videos

Mario Sanz

@msanz.bsky.social

What looks like a trivial formatting choice can actually alter research conclusions, so mind the gap!

Big thanks to my co-authors @minhducbui.bsky.social & Katharina von der Wense!

📄 Read the full paper here: arxiv.org/abs/2509.15020

Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMs

When evaluating large language models (LLMs) with multiple-choice question answering (MCQA), it is common to end the prompt with the string "Answer:" to facilitate automated answer extraction via next...

arxiv.org

September 26, 2025 at 9:18 AM

Mario Sanz

@msanz.bsky.social

Surprisingly, this small detail:
✅ Shifts model accuracy by up to 11%
✅ Changes which model tops the leaderboard – raising serious concerns about comparability of LLM leaderboards in prior work
✅ Affects calibration (reliability of confidence estimates)

September 26, 2025 at 9:18 AM

Mario Sanz

@msanz.bsky.social

In our #EMNLP2025 paper we study how the space before the answer letter (e.g., "A" vs. "␣A") is tokenized.

Practice is currently split: no community-wide standard exists, and even popular evaluation frameworks differ.

September 26, 2025 at 9:18 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news