Lightnews — Scholar-powered news

Reposted by Sagnik Mukherjee

ConvAI @ UIUC

@convai-uiuc.bsky.social

Reinforcement Learning Finetunes Small Subnetworks in Large Language Models by @sagnikmukherjee.bsky.social, Lifan Yuan, @dilekh.bsky.social, Hao Peng

Read more here: arxiv.org/abs/2505.11711
x.com/saagnikkk/st...

Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

Reinforcement learning (RL) yields substantial improvements in large language models (LLMs) downstream task performance and alignment with human values. Surprisingly, such large gains result from upda...

arxiv.org

September 20, 2025 at 3:17 PM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

Paper - arxiv.org/abs/2505.11711
Work done with amazing collaborator Lifan Yuan, and advised by our amazing advisors @dilekh.bsky.social and Hao Peng.

Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

Reinforcement learning (RL) yields substantial improvements in large language models (LLMs) downstream task performance and alignment with human values. Surprisingly, such large gains result from upda...

arxiv.org

May 21, 2025 at 3:50 AM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

🧵[8/n] To the best of our knowledge this is the first mechanistic evidence that shows contrast between learning from in distribution (or on-policy) data vs Off Distribution (off-policy) data.

May 21, 2025 at 3:50 AM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

🧵[7/n]

🔍 Potential Reasons

💡 We hypothesize that the in-distribution nature of training data is a key driver behind this sparsity
🧠 The model already "knows" a lot — RL just fine-tunes a small, relevant subnetwork rather than overhauling everything

May 21, 2025 at 3:50 AM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

🧵[6/n]

🌐 The Subnetwork Is General
🔁 Subnetworks trained with different seed, datasets, or even algorithms show nontrivial overlap
🧩 Suggests the subnetwork is a generalizable structure tied to the base model
🧠 A shared backbone seems to emerge, no matter how you train it

May 21, 2025 at 3:50 AM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

🧵[5/n]📊
🧪 Training the Subnetwork Reproduces Full Model

1️⃣ When trained in isolation, the sparse subnetwork recovers almost the exact same weights as the full model
2️⃣ achieves comparable (or better) end-task performance
3️⃣ 🧮 Even the training loss converges more smoothly

May 21, 2025 at 3:50 AM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

🧵[4/n]

📚 Each Layer Is Equally Sparse (or Dense)

📏 No specific layer or sublayer gets special treatment — all layers are updated equally sparsely.
🎯Despite the sparsity, the updates are still full-rank

May 21, 2025 at 3:50 AM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

🧵[3/n]

📉 Even Gradients Are Sparse in RL 📉

🧠 In PRIME, 72% of parameters never receive any gradient — ever!
↔️ Some do, but their gradients cancel out over time.
🎯 It’s not just sparse updates, even the gradients are sparse

May 21, 2025 at 3:50 AM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

🧵[2/n]

💡 SFT Updates Are Dense 💡
Unlike RL, Supervised Fine-Tuning (SFT) updates are much denser 🧠
📊 Sparsity is low — at most only 15.31% of parameters remain untouched.

May 21, 2025 at 3:50 AM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

May 21, 2025 at 3:50 AM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

📂 Code and data coming soon! Read our paper here: arxiv.org/abs/2502.02362

This would not have been possible without the contributions of @abhinav-chinta.bsky.social @takyoung.bsky.social Tarun and our amazing advisor @dilekh.bsky.social Special thanks to the members of @convai-uiuc.bsky.social

May 7, 2025 at 6:52 PM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

🧠 Additional insights:
1️⃣ Spotting errors in synthetic negative samples is WAY easier than catching real-world mistakes
2️⃣ False positives are inflating math benchmark scores - time for more honest evaluation methods!

🧵[6/n]

May 7, 2025 at 6:52 PM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

📈 Our results:
PARC improves error detection accuracy by 6-16%, enabling more reliable step-level verification in mathematical reasoning chains.

🧵[5/n]

May 7, 2025 at 6:52 PM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

📊 The exciting part?
LLMs can reliably identify these critical premises - the specific prior statements that directly support each reasoning step. This creates a transparent structure showing exactly which information is necessary for each conclusion.

🧵[4/n]

May 7, 2025 at 6:52 PM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

💡 Our solution:
We propose Premise-Augmented Reasoning Chains (PARC): converting linear reasoning into directed graphs by explicitly linking each reasoning step to its necessary premises.
We also propose accumulation errors, an error type ignored in prior work.
🧵[3/n]

May 7, 2025 at 6:52 PM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

📌 Issue: Verifying lengthy reasoning chains is tough due to hidden step dependencies. The current step doesn’t depend on all previous steps, making the context full of distractors.

🧵[2/n]

May 7, 2025 at 6:52 PM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

🙋

November 24, 2024 at 1:20 AM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

Work done with an amazing team (most of them are not here yet, other than @faridlazuarda.bsky.social )

November 21, 2024 at 10:03 PM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

We call out that most studies of culture has focused on a "thin" description.
"Digitally under-represented cultures are more likely to get represented by their “thin descriptions" created by “outsiders" on the digital space, which can further aggravate the biases and stereotypes."

November 21, 2024 at 10:03 PM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

🚩 We discovered some key gaps: Incomplete cultural coverage, issues with methodological robustness, and a lack of situated studies for real-world applicability. These gaps limit our understanding of cultural biases in LLMs. [5/7]

November 21, 2024 at 10:03 PM

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

📚 Most studies use black-box probing methods to examine LLMs' cultural biases. However, these methods can be sensitive to prompt wording, raising concerns about robustness and generalizability. [4/7]

November 21, 2024 at 10:03 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news