Tianwei Ni
twni2016.bsky.social
Tianwei Ni
@twni2016.bsky.social
https://twni2016.github.io/ Reinforcement Learning PhD student @Mila
Work completed during my internship at Amazon Science. Thank you to my co-authors @allenanie.bsky.social, Sapana Chaudhary, Yao Liu, Huzefa Rangwala, @rasoolfa.bsky.social!
April 23, 2025 at 10:05 PM
Results on challenging math games, Countdown & Game-of-24:
⚡180× faster inference than search-based baseline
📈Beats CoT and inference-time search (ToT, RAP)

📄 Paper: arxiv.org/abs/2504.11364
💻 Code & data: github.com/twni2016/llm...
Teaching Large Language Models to Reason through Learning and Forgetting
Leveraging inference-time search in large language models has proven effective in further enhancing a trained model's capability to solve complex mathematical and reasoning problems. However, this app...
arxiv.org
April 23, 2025 at 10:05 PM
2️⃣ Learn successful reasoning paths ✅ while forgetting failed reasoning paths ❌ at the same time, which we call Unlikelihood Fine-Tuning (UFT)
3️⃣ Small learning rate is crucial to preserve inference-time search capabilities
April 23, 2025 at 10:05 PM
1️⃣ Aggregate reasoning paths from diverse sources: Chain-of-Thought, inference-time search (Tree-of-Thought, Reasoning-via-Planning), classic algorithms (BFS, DFS)
April 23, 2025 at 10:05 PM