Sara Vera Marjanovic
saravera.bsky.social
Sara Vera Marjanovic
@saravera.bsky.social
PhD fellow in XAI, IR & NLP
✈️ Mila - Quebec AI Institute | University of Copenhagen 🏰
#NLProc #ML #XAI
Recreational sufferer
Pinned
Models like DeepSeek-R1 🐋 mark a fundamental shift in how LLMs approach complex problems. In our preprint on R1 Thoughtology, we study R1’s reasoning chains across a variety of tasks; investigating its capabilities, limitations, and behaviour.
🔗: mcgill-nlp.github.io/thoughtology/
Thanks again to the many collaborators and contributors, especially @arkil_patel @sivareddyg and @mcgill_nlp 💜
January 15, 2026 at 2:37 PM
🚨Thoughtology is now accepted to #TMLR! We've added some new analyses, most notably:
🌟 We quantify rumination; repetitive thoughts are associated with incorrect responses
🌟 We add 2 LRMs: gpt-oss and Qwen3. Both show a reasoning 'sweet spot'
See 📃 : openreview.net/forum?id=BZw...
January 15, 2026 at 2:35 PM
Reposted by Sara Vera Marjanovic
Our new paper in #PNAS (bit.ly/4fcWfma) presents a surprising finding—when words change meaning, older speakers rapidly adopt the new usage; inter-generational differences are often minor.

w/ Michelle Yang, ‪@sivareddyg.bsky.social‬ , @msonderegger.bsky.social‬ and @dallascard.bsky.social‬👇(1/12)
July 29, 2025 at 12:06 PM
And thoughtology is now on Arxiv! Read more about R1 reasoning 🐋💭 across visual, cultural and psycholinguistic tasks at the link below:

🔗 arxiv.org/abs/2504.07128
April 11, 2025 at 4:31 PM
This paper was a large group effort from @mcgill-nlp.bsky.social @mila-quebec.bsky.social
We encourage you to read the full paper for a more detailed discussion of our findings and hope that our insights encourage future work studying the reasoning behaviour of LLMs.
April 1, 2025 at 8:07 PM
Our paper also contains additional analyses on faithfulness to user input, language-specific reasoning behaviour, similarity to human language processing, and iterative world modeling via ASCII generation.
April 1, 2025 at 8:07 PM
DeepSeek-R1 also exhibits higher safety vulnerabilities compared to its non-reasoning counter-part DeepSeek-V3 and the model's reasoning capabilities can be used to generate jailbreak attacks that successfully elicit harmful responses from other safety-aligned LLMs.
April 1, 2025 at 8:07 PM
Notably, we show DeepSeek-R1 has a ‘sweet spot’ of reasoning, where extra inference time can impair model performance and continuously scaling length of thoughts does not necessarily increase performance.
April 1, 2025 at 8:07 PM
DeepSeek-R1’s thoughts follow a consistent structure. After determining the problem goal, it decomposes the problem towards an interim solution. It will then either re-explore or re-verify the solution multiple times before completion, though these re-verifications can lack in diversity.
April 1, 2025 at 8:07 PM
The availability of R1’s reasoning chains allows us to systematically study its reasoning process, an endeavor we term Thoughtology💭. Starting from a taxonomy of R1s reasoning chains, we study the complex reasoning behavior of LRMs and provide some of our main findings below👇.
April 1, 2025 at 8:07 PM
Models like DeepSeek-R1 🐋 mark a fundamental shift in how LLMs approach complex problems. In our preprint on R1 Thoughtology, we study R1’s reasoning chains across a variety of tasks; investigating its capabilities, limitations, and behaviour.
🔗: mcgill-nlp.github.io/thoughtology/
April 1, 2025 at 8:07 PM
Reposted by Sara Vera Marjanovic
📚 How good are language models at utilising contexts in RAG scenarios?
We release 🧙🏽‍♀️DRUID to facilitate studies of context usage in real-world scenarios.
arxiv.org/abs/2412.17031

w/ @saravera.bsky.social, H.Yu, @rnv.bsky.social, C.Lioma, M.Maistro, @apepa.bsky.social and @iaugenstein.bsky.social ⭐️
A Reality Check on Context Utilisation for Retrieval-Augmented Generation
Retrieval-augmented generation (RAG) helps address the limitations of the parametric knowledge embedded within a language model (LM). However, investigations of how LMs utilise retrieved information o...
arxiv.org
January 2, 2025 at 7:15 AM