Lightnews — Scholar-powered news

Sagnik Anupam

@sagnikanupam.bsky.social

CIS PhD at Penn | MIT CS + Math '24
sagnikanupam.com

PhD student working on AI reasoning in large multimodal models. I design methods to build better models for math, code, visual reasoning, agents, and robotics.

Posts Replies Media Videos

Sagnik Anupam

@sagnikanupam.bsky.social

Our results generalize well to different model sizes (0.5B, 1B, 1.5B) and families (Qwen, Llama, Gemma).

Results for Qwen2.5-1.5B, Llama3.2-1B, Gemma3

October 14, 2025 at 3:16 PM

Sagnik Anupam

@sagnikanupam.bsky.social

For off-policy updates, we incorporate group advantage estimation into the policy gradient algorithm, and derive an importance weighted estimator to correct for the bias arising from off-policy learning.

Group advantage estimation formula used in RAPID

October 14, 2025 at 3:16 PM

Sagnik Anupam

@sagnikanupam.bsky.social

Having only limited number of GPUs to train your language model? We introduce RAPID, a novel RL algorithm that can substantially reduce the post-training time of small language models under resource-constrained scenarios.

Table showing MBPP+, MATH, and MiniF2F results for SFT, GRPO, DAPO, PG, and RAPID.

October 14, 2025 at 3:16 PM

Sagnik Anupam

@sagnikanupam.bsky.social

Example user-submitted task: “Find me the last available train from Cardiff Central to Barry Docks station today on trainline”

Deepseek-R1 GIF:

October 14, 2025 at 6:14 AM

Sagnik Anupam

@sagnikanupam.bsky.social

Introducing an evaluation platform for web agents–BrowserArena! Combining the awesome @lmarena.bsky.social platform with BrowserUse, we rank LLMs side-by-side to compare their ability to solve web navigation tasks!

Users vote for models using GIFs and text outputs to judge task performance.

October 14, 2025 at 6:14 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news