https://martinagvilas.github.io/
Learned so much from this amazing team! Huge thanks to my coauthors: @vidhishab.bsky.social, Safoora Yousefi, @besmiranushi.bsky.social, @erichorvitz.bsky.social
Learned so much from this amazing team! Huge thanks to my coauthors: @vidhishab.bsky.social, Safoora Yousefi, @besmiranushi.bsky.social, @erichorvitz.bsky.social
This enables early path selection during parallel generation and ~60% token savings with +2.1% accuracy gains 🚀
This enables early path selection during parallel generation and ~60% token savings with +2.1% accuracy gains 🚀
⚡ 48% average token reduction (up to 70%!)
📈 +2.6% accuracy improvement over majority voting
🎯 Works by identifying correct paths even when the majority is wrong
⚡ 48% average token reduction (up to 70%!)
📈 +2.6% accuracy improvement over majority voting
🎯 Works by identifying correct paths even when the majority is wrong
✴️ Larger overall representational change (Net ↑)
✴️ Less wandering in latent space (Cumulative ↓)
✴️ More direct progress toward final state (Aligned ↑)
✴️ Larger overall representational change (Net ↑)
✴️ Less wandering in latent space (Cumulative ↓)
✴️ More direct progress toward final state (Aligned ↑)
✅ Significantly predict correctness
✅ Outperform output-based confidence measures and cross-layer signals
✅ Significantly predict correctness
✅ Outperform output-based confidence measures and cross-layer signals
📊 Net Change: Overall shift (start → end)
🔄 Cumulative Change: Total movement
🎯 Aligned Change: Progress toward final state
📊 Net Change: Overall shift (start → end)
🔄 Cumulative Change: Total movement
🎯 Aligned Change: Progress toward final state
Our solution: Look inside the temporal evolution of the model's latent space! 🔍
Our solution: Look inside the temporal evolution of the model's latent space! 🔍
📈Scaling this inference-time compute (longer traces, multiple samples) significantly improves performance across reasoning tasks.
📈Scaling this inference-time compute (longer traces, multiple samples) significantly improves performance across reasoning tasks.