Jirui Qi
jiruiqi.bsky.social
Jirui Qi
@jiruiqi.bsky.social
Ph.D Candidate @GroNLP, University of Groningen #NLProc
https://betswish.github.io
Pinned
[1/]💡New Paper
Large reasoning models (LRMs) are strong in English — but how well do they reason in your language?

Our latest work uncovers their limitation and a clear trade-off:
Controlling Thinking Trace Language Comes at the Cost of Accuracy

📄Link: arxiv.org/abs/2505.22888
Reposted by Jirui Qi
InCLow topics #EMNLP2025:

- MT error prediction techniques & its reception by professional translators (@gsarti.com)
- thinking language in Large Reasoning Models (@jiruiqi.bsky.social)
- effect of stereotypes on LLM’s implicit personalization (@veraneplenbroek.bsky.social)

....
October 31, 2025 at 10:50 PM
Our paper on multilingual reasoning is accepted to Findings of #EMNLP2025! 🎉 (OA: 3/3/3.5/4)

We show SOTA LMs struggle with reasoning in non-English languages; prompt-hack & post-training improve alignment but trade off accuracy.

📄 arxiv.org/abs/2505.22888
See you in Suzhou! #EMNLP
August 20, 2025 at 8:02 PM
Reposted by Jirui Qi
📢 New paper: Can unsupervised metrics extracted from MT models detect their translation errors reliably? Do annotators even *agree* on what constitutes an error? 🧐

We compare uncertainty- and interp-based WQE metrics across 12 directions, with some surprising findings!

🧵 1/
May 30, 2025 at 2:28 PM
Reposted by Jirui Qi
“Child-Directed Language Does Not Consistently Boost Syntax Learning in Language Models”

I’m happy to share that the preprint of my first PhD project is now online!

🎊 Paper: arxiv.org/abs/2505.23689
Child-Directed Language Does Not Consistently Boost Syntax Learning in Language Models
Seminal work by Huebner et al. (2021) showed that language models (LMs) trained on English Child-Directed Language (CDL) can reach similar syntactic abilities as LMs trained on much larger amounts of ...
arxiv.org
May 30, 2025 at 7:40 AM
[1/]💡New Paper
Large reasoning models (LRMs) are strong in English — but how well do they reason in your language?

Our latest work uncovers their limitation and a clear trade-off:
Controlling Thinking Trace Language Comes at the Cost of Accuracy

📄Link: arxiv.org/abs/2505.22888
May 30, 2025 at 1:09 PM
✨ New Paper ✨
[1/] Retrieving passages from many languages can boost retrieval augmented generation (RAG) performance, but how good are LLMs at dealing with multilingual contexts in the prompt?

📄 Check it out: arxiv.org/abs/2504.00597
(w/ @arianna-bis.bsky.social @Raquel_Fernández)

#NLProc
April 11, 2025 at 4:04 PM
🎉 First post on Blue: Our paper on **efficient prompt engineering** has been accepted by NAACL2025 Main Conference! 🎉

Key Point: LLMs tend to generate better responses when the likelihood of the question segment is higher.
I.e. p(question) ∝ Performance

Paper available at: arxiv.org/abs/2411.07773
Likelihood as a Performance Gauge for Retrieval-Augmented Generation
Recent work finds that retrieval-augmented generation with large language models is prone to be influenced by the order of retrieved documents in the context. However, the lack of in-depth analysis li...
arxiv.org
January 24, 2025 at 9:56 AM