Lightnews — Scholar-powered news

Ruizhe Li

@ruizheli.bsky.social

150 followers 260 following 8 posts

Assistant Professor at University of Aberdeen | Postdoc at UCL | PhD at University of Sheffield | mechanistic interpretability & multimodal LLMs | https://www.ruizhe.space

Posts Replies Media Videos

Ruizhe Li

@ruizheli.bsky.social

This work was collaborated with Chen Chen, Yuchen Hu, Yanjun Gao, Xi Wang and Prof. Emine Yilmaz.
Our paper: huggingface.co/papers/2505....
Our code: github.com/ruizheliUOA/...

Paper page - Attributing Response to Context: A Jensen-Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation

Join the discussion on this paper page

huggingface.co

June 3, 2025 at 5:27 PM

Ruizhe Li

@ruizheli.bsky.social

This finding confirms the contribution of MLP located using ARC-JSD above, and it is reasonable because Chinese is one of main language resources used in Qwen2 pre- and post-training.

June 3, 2025 at 5:25 PM

Ruizhe Li

@ruizheli.bsky.social

In our case study for located MLP layers in Qwen2 models, we identify several correct decoded tokens are gradually transferred from their Chinese format to the English version, such as 一只(A), 拥有(has) and 翅膀(wings) in the figure.

June 3, 2025 at 5:25 PM

Ruizhe Li

@ruizheli.bsky.social

In addition, we move forward to locate relevant attention heads and MLP layers using JSD from mechinterp view. We found that JSD-based mechinterp can identify context attribution-related attention heads and MLPs, which are mainly distributed around intermediate or higher layers.

June 3, 2025 at 5:24 PM

Ruizhe Li

@ruizheli.bsky.social

We evaluate our ARC-JSD on TyDi QA, Hotpot QA and MuSiQue datasets using Qwen2-1.5B/7B-IT and Gemma2-2B/9B-IT, which can achieve higher attribution acc than baseline.

June 3, 2025 at 5:23 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news