Lightnews — Scholar-powered news

Ali Modarressi

@amodarressi.bsky.social

Details on poster times and locations coming soon.

Would love to meet and chat ☕️💬

If you’re attending #ACL2025, feel free to stop by and say hi! 👋
🧵[4/4]

July 20, 2025 at 10:53 PM

Ali Modarressi

@amodarressi.bsky.social

⏱️🔎 Time Course MechInterp
We track how factual knowledge forms in OLMo over training by analyzing the evolving roles of Attention Heads and FFNs.
Heads are dynamic and often repurposed; FFNs are stable and keep refining facts.
By: A. Dawar Hakimi
arxiv.org/abs/2506.03434
🧵[3/4]

Time Course MechInterp: Analyzing the Evolution of Components and Knowledge in Large Language Models

Understanding how large language models (LLMs) acquire and store factual knowledge is crucial for enhancing their interpretability and reliability. In this work, we analyze the evolution of factual kn...

arxiv.org

July 20, 2025 at 10:53 PM

Ali Modarressi

@amodarressi.bsky.social

🌐 MEXA: Multilingual Evaluation of English-Centric LLMs

A method for assessing the multilingual capabilities of English-centric LLMs using parallel sentences. It estimates how many languages an LLM covers and at what level.

By: @kargaranamir.bsky.social

x.com/amir_nlp/sta...
🧵[2/4]

Amir H. Kargaran on X: "Excited to introduce MEXA, a method for assessing the multilingual capabilities of English-centric LLMs using parallel sentences. It estimates how many languages an LLM covers and at what level. Paper: https://t.co/awRq0Y4SCl Code: https://t.co/M3UVh2F9J1 https://t.co/xBOQ1DJmWx" / X

Excited to introduce MEXA, a method for assessing the multilingual capabilities of English-centric LLMs using parallel sentences. It estimates how many languages an LLM covers and at what level. Paper: https://t.co/awRq0Y4SCl Code: https://t.co/M3UVh2F9J1 https://t.co/xBOQ1DJmWx

x.com

July 20, 2025 at 10:53 PM

Ali Modarressi

@amodarressi.bsky.social

Full NoLiMa post thread (X / Twitter): x.com/AModarressi/...

Ali Modarressi on X: "🚀 Introducing NoLiMa Paper 🚀 Most long-context benchmarks have literal overlaps between the questions and the context—but what if they didn’t? 🤔 Turns out, it’s a tough challenge! Powerful models like GPT-4o performance drops from 99.3% to 69.7% at 32K context length. 📉 https://t.co/Fo3YsGCBsi" / X

🚀 Introducing NoLiMa Paper 🚀 Most long-context benchmarks have literal overlaps between the questions and the context—but what if they didn’t? 🤔 Turns out, it’s a tough challenge! Powerful models like GPT-4o performance drops from 99.3% to 69.7% at 32K context length. 📉 https://t.co/Fo3YsGCBsi

x.com

July 9, 2025 at 1:53 PM

Ali Modarressi

@amodarressi.bsky.social

Check out the paper & our GitHub repo (with results on recent models 🆕✨)!
📄: arxiv.org/abs/2502.05167
🔗: github.com/adobe-resear...
🤗: huggingface.co/datasets/amo...
This work was my internship project at
@adobe.com, in collaboration with my mentors there and Hinrich Schütze.

NoLiMa: Long-Context Evaluation Beyond Literal Matching

Recent large language models (LLMs) support long contexts ranging from 128K to 1M tokens. A popular method for evaluating these capabilities is the needle-in-a-haystack (NIAH) test, which involves ret...

arxiv.org

July 9, 2025 at 1:53 PM

Ali Modarressi

@amodarressi.bsky.social

The takeaway? we need robust retrievers that prioritize answer relevance, not just heuristic shortcuts.

work with an amazing team:
@mohsen-fayyaz.bsky.social,
Hinrich Schütze,
@violetpeng.bsky.social

paper: arxiv.org/abs/2503.05037
dataset 🤗: t.co/QZFyCLqP0P

Cross-post from x.com/mohsen_fayyaz

Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence

Dense retrieval models are commonly used in Information Retrieval (IR) applications, such as Retrieval-Augmented Generation (RAG). Since they often serve as the first step in these systems, their robu...

arxiv.org

May 17, 2025 at 8:28 PM

Ali Modarressi

@amodarressi.bsky.social

We also analyze RAG: biased retrievers can mislead LLMs, degrading their performance by 34%, worse than retrieving nothing! 😮

May 17, 2025 at 8:28 PM

Ali Modarressi

@amodarressi.bsky.social

When multiple biases combine, retrievers fail catastrophically:
📉 Answer-containing docs ranked <3% of the time over a synthetic biased doc with no answer!

May 17, 2025 at 8:28 PM

Ali Modarressi

@amodarressi.bsky.social

Dense retrievers are crucial for RAG and search, but do they actually retrieve useful evidence? 🤔
We design controlled experiments by repurposing a relation extraction dataset, exposing serious flaws in models like Dragon+ and Contriever.

May 17, 2025 at 8:28 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news