Oleksii Kuchaiev
banner
kuchaev.bsky.social
Oleksii Kuchaiev
@kuchaev.bsky.social
AI model alignment @ NVIDIA
New paper from our team. An inference-time scaling approach which can boost non-math benchmarks such as Arena-Hard of existing models. We get Arena-Hard of 92.7 for 70B model. As of 5 Mar 2025, surpassing o1-preview-2024-09- 12 (90.4) and DS-R1 (92.3). arxiv.org/pdf/2503.04378
March 7, 2025 at 6:42 PM
Our team put together a unified mathematical framework to analyze popular model alignment algorithms. “Reward-aware Preference Optimization: A Unified Mathematical Framework
for Model Alignment” arxiv.org/pdf/2502.00203.
February 4, 2025 at 5:25 PM