Lightnews — Scholar-powered news

Zorik Gekhman

@zorikgekhman.bsky.social

Yet the fact that models fail to generate known answers puts a practical constraint on scaling test-time compute via repeated sampling in closed-book QA. Significant gains remain inaccessible because we fail to sample answers that the probe would otherwise rank first.
15/🧵

March 31, 2025 at 6:34 PM

Zorik Gekhman

@zorikgekhman.bsky.social

We also leverage our setup to enhance performance in a challenging closed-book QA setting, achieving a 12% average relative improvement over greedy decoding by increasing test-time compute: sampling a large set of answers and selecting the top one using our probe.
14/🧵

March 31, 2025 at 6:34 PM

Zorik Gekhman

@zorikgekhman.bsky.social

For example, here the correct answer “Volvo Buses” gets a very low P(a | q) score, meaning it is unlikely to be generated. Accordingly, it wasn’t sampled after 1,000 attempts in our study. Yet the probe scores it higher than all other alternatives.
13/🧵

March 31, 2025 at 6:34 PM

Zorik Gekhman

@zorikgekhman.bsky.social

We also discover an extreme case of hidden knowledge. When the ground-truth answer isn’t sampled after 1,000 attempts, manually adding it to the set of candidate answers leads to a substantial increase in knowledge scores.
11/🧵

March 31, 2025 at 6:34 PM

Zorik Gekhman

@zorikgekhman.bsky.social

Our results indicate that LLMs consistently exhibit hidden knowledge, with an average relative gap of 40%.

This highlights the need to understand these differences and build models that better use their knowledge, for which our framework serves as a foundation.

10/🧵

March 31, 2025 at 6:34 PM

Zorik Gekhman

@zorikgekhman.bsky.social

In our study, we estimate the set of (correct, incorrect) answer pairs per question using 1,000 model-generated answers, labeled for correctness by an LLM judge that compares each answer to the ground truth.
8/🧵

March 31, 2025 at 6:34 PM

Zorik Gekhman

@zorikgekhman.bsky.social

We define hidden knowledge as the condition in which internal knowledge exceeds external knowledge.
7/🧵

March 31, 2025 at 6:34 PM

Zorik Gekhman

@zorikgekhman.bsky.social

We propose to measure knowledge relative to a function that scores answer candidates using signals from the model, and we formalize knowledge of a question as the fraction of correct-incorrect answer pairs where the correct one is scored higher.
5/🧵

March 31, 2025 at 6:33 PM

Zorik Gekhman

@zorikgekhman.bsky.social

🚨 It's often claimed that LLMs know more facts than they show in their outputs, but what does this actually mean, and how can we measure this “hidden knowledge”?

In our new paper, we clearly define this concept and design controlled experiments to test it.
1/🧵

March 31, 2025 at 6:33 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news