https://betswish.github.io
Large reasoning models (LRMs) are strong in English — but how well do they reason in your language?
Our latest work uncovers their limitation and a clear trade-off:
Controlling Thinking Trace Language Comes at the Cost of Accuracy
📄Link: arxiv.org/abs/2505.22888
- MT error prediction techniques & its reception by professional translators (@gsarti.com)
- thinking language in Large Reasoning Models (@jiruiqi.bsky.social)
- effect of stereotypes on LLM’s implicit personalization (@veraneplenbroek.bsky.social)
....
- MT error prediction techniques & its reception by professional translators (@gsarti.com)
- thinking language in Large Reasoning Models (@jiruiqi.bsky.social)
- effect of stereotypes on LLM’s implicit personalization (@veraneplenbroek.bsky.social)
....
We show SOTA LMs struggle with reasoning in non-English languages; prompt-hack & post-training improve alignment but trade off accuracy.
📄 arxiv.org/abs/2505.22888
See you in Suzhou! #EMNLP
We show SOTA LMs struggle with reasoning in non-English languages; prompt-hack & post-training improve alignment but trade off accuracy.
📄 arxiv.org/abs/2505.22888
See you in Suzhou! #EMNLP
We compare uncertainty- and interp-based WQE metrics across 12 directions, with some surprising findings!
🧵 1/
We compare uncertainty- and interp-based WQE metrics across 12 directions, with some surprising findings!
🧵 1/
I’m happy to share that the preprint of my first PhD project is now online!
🎊 Paper: arxiv.org/abs/2505.23689
I’m happy to share that the preprint of my first PhD project is now online!
🎊 Paper: arxiv.org/abs/2505.23689
Large reasoning models (LRMs) are strong in English — but how well do they reason in your language?
Our latest work uncovers their limitation and a clear trade-off:
Controlling Thinking Trace Language Comes at the Cost of Accuracy
📄Link: arxiv.org/abs/2505.22888
Large reasoning models (LRMs) are strong in English — but how well do they reason in your language?
Our latest work uncovers their limitation and a clear trade-off:
Controlling Thinking Trace Language Comes at the Cost of Accuracy
📄Link: arxiv.org/abs/2505.22888
[1/] Retrieving passages from many languages can boost retrieval augmented generation (RAG) performance, but how good are LLMs at dealing with multilingual contexts in the prompt?
📄 Check it out: arxiv.org/abs/2504.00597
(w/ @arianna-bis.bsky.social @Raquel_Fernández)
#NLProc
[1/] Retrieving passages from many languages can boost retrieval augmented generation (RAG) performance, but how good are LLMs at dealing with multilingual contexts in the prompt?
📄 Check it out: arxiv.org/abs/2504.00597
(w/ @arianna-bis.bsky.social @Raquel_Fernández)
#NLProc
Key Point: LLMs tend to generate better responses when the likelihood of the question segment is higher.
I.e. p(question) ∝ Performance
Paper available at: arxiv.org/abs/2411.07773
Key Point: LLMs tend to generate better responses when the likelihood of the question segment is higher.
I.e. p(question) ∝ Performance
Paper available at: arxiv.org/abs/2411.07773