Lightnews — Scholar-powered news

Martina Vilas

@martinagvilas.bsky.social

2.2K followers 450 following 18 posts

Computer Science PhD student | AI interpretability | Vision + Language | Cogntive Science. Prev. intern @MicrosoftResearch.

https://martinagvilas.github.io/

Posts Replies Media Videos

Martina Vilas

@martinagvilas.bsky.social

Working on this project was a great experience during my internship at @msftresearch.bsky.social 💙

Learned so much from this amazing team! Huge thanks to my coauthors: @vidhishab.bsky.social, Safoora Yousefi, @besmiranushi.bsky.social, @erichorvitz.bsky.social

October 22, 2025 at 3:38 PM

Martina Vilas

@martinagvilas.bsky.social

We also found that these signals emerge EARLY in reasoning! At just 4k tokens, we can predict solution quality with ROC-AUC > 0.6.

This enables early path selection during parallel generation and ~60% token savings with +2.1% accuracy gains 🚀

October 22, 2025 at 3:38 PM

Martina Vilas

@martinagvilas.bsky.social

Using LT signals for answer selection in multi-sample inference leads to:

⚡ 48% average token reduction (up to 70%!)
📈 +2.6% accuracy improvement over majority voting
🎯 Works by identifying correct paths even when the majority is wrong

October 22, 2025 at 3:38 PM

Martina Vilas

@martinagvilas.bsky.social

Hidden states have distinctive temporal patterns for correct paths. They show:

✴️ Larger overall representational change (Net ↑)
✴️ Less wandering in latent space (Cumulative ↓)
✴️ More direct progress toward final state (Aligned ↑)

October 22, 2025 at 3:38 PM

Martina Vilas

@martinagvilas.bsky.social

Across 3 reasoning models (DeepSeek-R1, Phi-4-Reasoning-Plus, Qwen3) and diverse domains (GPQA, AIME, TSP), LT signals:

✅ Significantly predict correctness
✅ Outperform output-based confidence measures and cross-layer signals

October 22, 2025 at 3:38 PM

Martina Vilas

@martinagvilas.bsky.social

We track how representations evolve through the trace and extract 3 complementary signals:

📊 Net Change: Overall shift (start → end)
🔄 Cumulative Change: Total movement
🎯 Aligned Change: Progress toward final state

October 22, 2025 at 3:38 PM

Martina Vilas

@martinagvilas.bsky.social

Identifying trace quality is critical: it enables more reliable predictions, improves efficiency by avoiding wasted compute, and can be used to guide models toward productive reasoning strategies.

Our solution: Look inside the temporal evolution of the model's latent space! 🔍

October 22, 2025 at 3:38 PM

Martina Vilas

@martinagvilas.bsky.social

But not all reasoning traces are equal ⚖️ → some contain productive steps that lead to correct solutions ✅, while others deviate into overthinking, fail to converge, or exhibit inconsistent reasoning patterns ❌

October 22, 2025 at 3:38 PM

Martina Vilas

@martinagvilas.bsky.social

Modern LLMs use chain-of-thought reasoning to solve complex problems, generating step-by-step solutions that can span thousands of tokens.

📈Scaling this inference-time compute (longer traces, multiple samples) significantly improves performance across reasoning tasks.

October 22, 2025 at 3:38 PM

Martina Vilas

@martinagvilas.bsky.social

👋 I also work on the field (examples on my profile). Would love to be added!

November 19, 2024 at 9:42 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news