Ahmed S. M. Elhady
ahmedelhady.bsky.social
Ahmed S. M. Elhady
@ahmedelhady.bsky.social
PhD Student @UPV/EHU, working on multilingual and multimodal GenAi. ex Microsoft and Agolo.
🤔 Recognizing NOTA requires better reasoning. Can chain of thought help reduce the gap?
We evaluated the Wicked variants of MMLU, MMLU-pro, and MMLU-Redux using 0-shot CoT. The performance drop is above 5%, showing that reasoning helps, but Wicked is challenging even for CoT.
February 26, 2025 at 11:51 AM
⚠️ Be careful not to break the coherence of the questions!
Our analysis identified questions with multiple correct candidates, yet only one being most suitable. Our method includes a model to automatically detect these questions, excluding them from the Wicked process.
February 26, 2025 at 11:51 AM
Method: We randomly replace a choice with "None of the above". NOTA should be chosen only when it replaces the correct answer. This method is often used in educational exams to assess the understanding of the examinees, encouraging thorough consideration of all options before answering.
February 26, 2025 at 11:51 AM