Ali Modarressi
banner
amodarressi.bsky.social
Ali Modarressi
@amodarressi.bsky.social
PhD student, NLP Researcher at @cislmu.bsky.social | Prev. Intern @Adobe.com
Details on poster times and locations coming soon.

Would love to meet and chat ☕️💬

If you’re attending #ACL2025, feel free to stop by and say hi! 👋
🧵[4/4]
July 20, 2025 at 10:53 PM
⏱️🔎 Time Course MechInterp
We track how factual knowledge forms in OLMo over training by analyzing the evolving roles of Attention Heads and FFNs.
Heads are dynamic and often repurposed; FFNs are stable and keep refining facts.
By: A. Dawar Hakimi
arxiv.org/abs/2506.03434
🧵[3/4]
Time Course MechInterp: Analyzing the Evolution of Components and Knowledge in Large Language Models
Understanding how large language models (LLMs) acquire and store factual knowledge is crucial for enhancing their interpretability and reliability. In this work, we analyze the evolution of factual kn...
arxiv.org
July 20, 2025 at 10:53 PM
🌐 MEXA: Multilingual Evaluation of English-Centric LLMs

A method for assessing the multilingual capabilities of English-centric LLMs using parallel sentences. It estimates how many languages an LLM covers and at what level.

By: @kargaranamir.bsky.social

x.com/amir_nlp/sta...
🧵[2/4]
Amir H. Kargaran on X: "Excited to introduce MEXA, a method for assessing the multilingual capabilities of English-centric LLMs using parallel sentences. It estimates how many languages an LLM covers and at what level. Paper: https://t.co/awRq0Y4SCl Code: https://t.co/M3UVh2F9J1 https://t.co/xBOQ1DJmWx" / X
Excited to introduce MEXA, a method for assessing the multilingual capabilities of English-centric LLMs using parallel sentences. It estimates how many languages an LLM covers and at what level. Paper: https://t.co/awRq0Y4SCl Code: https://t.co/M3UVh2F9J1 https://t.co/xBOQ1DJmWx
x.com
July 20, 2025 at 10:53 PM
Check out the paper & our GitHub repo (with results on recent models 🆕✨)!
📄: arxiv.org/abs/2502.05167
🔗: github.com/adobe-resear...
🤗: huggingface.co/datasets/amo...
This work was my internship project at
@adobe.com, in collaboration with my mentors there and Hinrich Schütze.
NoLiMa: Long-Context Evaluation Beyond Literal Matching
Recent large language models (LLMs) support long contexts ranging from 128K to 1M tokens. A popular method for evaluating these capabilities is the needle-in-a-haystack (NIAH) test, which involves ret...
arxiv.org
July 9, 2025 at 1:53 PM
The takeaway? we need robust retrievers that prioritize answer relevance, not just heuristic shortcuts.

work with an amazing team:
@mohsen-fayyaz.bsky.social,
Hinrich Schütze,
@violetpeng.bsky.social

paper: arxiv.org/abs/2503.05037
dataset 🤗: t.co/QZFyCLqP0P

Cross-post from x.com/mohsen_fayyaz
Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence
Dense retrieval models are commonly used in Information Retrieval (IR) applications, such as Retrieval-Augmented Generation (RAG). Since they often serve as the first step in these systems, their robu...
arxiv.org
May 17, 2025 at 8:28 PM
We also analyze RAG: biased retrievers can mislead LLMs, degrading their performance by 34%, worse than retrieving nothing! 😮
May 17, 2025 at 8:28 PM
When multiple biases combine, retrievers fail catastrophically:
📉 Answer-containing docs ranked <3% of the time over a synthetic biased doc with no answer!
May 17, 2025 at 8:28 PM
Dense retrievers are crucial for RAG and search, but do they actually retrieve useful evidence? 🤔
We design controlled experiments by repurposing a relation extraction dataset, exposing serious flaws in models like Dragon+ and Contriever.
May 17, 2025 at 8:28 PM