Sagnik Anupam
sagnikanupam.bsky.social
Sagnik Anupam
@sagnikanupam.bsky.social
CIS PhD at Penn | MIT CS + Math '24
sagnikanupam.com

PhD student working on AI reasoning in large multimodal models. I design methods to build better models for math, code, visual reasoning, agents, and robotics.
Our results generalize well to different model sizes (0.5B, 1B, 1.5B) and families (Qwen, Llama, Gemma).
October 14, 2025 at 3:16 PM
For off-policy updates, we incorporate group advantage estimation into the policy gradient algorithm, and derive an importance weighted estimator to correct for the bias arising from off-policy learning.
October 14, 2025 at 3:16 PM
Having only limited number of GPUs to train your language model? We introduce RAPID, a novel RL algorithm that can substantially reduce the post-training time of small language models under resource-constrained scenarios.
October 14, 2025 at 3:16 PM
Example user-submitted task: “Find me the last available train from Cardiff Central to Barry Docks station today on trainline”

Deepseek-R1 GIF:
October 14, 2025 at 6:14 AM
Introducing an evaluation platform for web agents–BrowserArena! Combining the awesome @lmarena.bsky.social platform with BrowserUse, we rank LLMs side-by-side to compare their ability to solve web navigation tasks!

Users vote for models using GIFs and text outputs to judge task performance.
October 14, 2025 at 6:14 AM