Lightnews — Scholar-powered news

Ahmed S. M. Elhady

@ahmedelhady.bsky.social

16 followers 18 following 4 posts

PhD Student @UPV/EHU, working on multilingual and multimodal GenAi. ex Microsoft and Agolo.

Posts Replies Media Videos

Ahmed S. M. Elhady

@ahmedelhady.bsky.social

🤔 Recognizing NOTA requires better reasoning. Can chain of thought help reduce the gap?
We evaluated the Wicked variants of MMLU, MMLU-pro, and MMLU-Redux using 0-shot CoT. The performance drop is above 5%, showing that reasoning helps, but Wicked is challenging even for CoT.

February 26, 2025 at 11:51 AM

Ahmed S. M. Elhady

@ahmedelhady.bsky.social

⚠️ Be careful not to break the coherence of the questions!
Our analysis identified questions with multiple correct candidates, yet only one being most suitable. Our method includes a model to automatically detect these questions, excluding them from the Wicked process.

February 26, 2025 at 11:51 AM

Ahmed S. M. Elhady

@ahmedelhady.bsky.social

Method: We randomly replace a choice with "None of the above". NOTA should be chosen only when it replaces the correct answer. This method is often used in educational exams to assess the understanding of the examinees, encouraging thorough consideration of all options before answering.

February 26, 2025 at 11:51 AM

Ahmed S. M. Elhady

@ahmedelhady.bsky.social

🧙‍♂️ New paper 🧙‍♀️:
Presenting Wicked: a simple automated method to make MCQA benchmarks more challenging. Wicked shook up 18 open-weight LLMs on 6 benchmarks, with up to 19.7% performance drop with direct prompting 🤯
Paper: shorturl.at/1CGq0
Code: shorturl.at/n2nCU

February 26, 2025 at 11:51 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news