Lightnews — Scholar-powered news

Zorik Gekhman

@zorikgekhman.bsky.social

This work is the result of a collaboration with a great team. Thanks to my co-authors:
Eyal Ben David, Hadas Orgad, Eran Ofek, Yonatan Belinkov, Idan Szpektor, Jonathan Herzig, and Roi Reichart.

Paper: arxiv.org/abs/2503.15299

17/🧵 (end)

Inside-Out: Hidden Factual Knowledge in LLMs

This work presents a framework for assessing whether large language models (LLMs) encode more factual knowledge in their parameters than what they express in their outputs. While a few studies hint at...

arxiv.org

March 31, 2025 at 6:34 PM

Zorik Gekhman

@zorikgekhman.bsky.social

We hope our framework facilitates future research on hidden knowledge, ultimately leading to more transparent and reliable LLMs.
16/🧵

March 31, 2025 at 6:34 PM

Zorik Gekhman

@zorikgekhman.bsky.social

Yet the fact that models fail to generate known answers puts a practical constraint on scaling test-time compute via repeated sampling in closed-book QA. Significant gains remain inaccessible because we fail to sample answers that the probe would otherwise rank first.
15/🧵

March 31, 2025 at 6:34 PM

Zorik Gekhman

@zorikgekhman.bsky.social

We also leverage our setup to enhance performance in a challenging closed-book QA setting, achieving a 12% average relative improvement over greedy decoding by increasing test-time compute: sampling a large set of answers and selecting the top one using our probe.
14/🧵

March 31, 2025 at 6:34 PM

Zorik Gekhman

@zorikgekhman.bsky.social

For example, here the correct answer “Volvo Buses” gets a very low P(a | q) score, meaning it is unlikely to be generated. Accordingly, it wasn’t sampled after 1,000 attempts in our study. Yet the probe scores it higher than all other alternatives.
13/🧵

March 31, 2025 at 6:34 PM

Zorik Gekhman

@zorikgekhman.bsky.social

This shows that LLMs can know the answer but have practically zero chance of generating it even once, despite large-scale repeated sampling.

This highlights limitations in the generation process and opens interesting directions for future research on decoding mechanisms.

12/🧵

March 31, 2025 at 6:34 PM

Zorik Gekhman

@zorikgekhman.bsky.social

We also discover an extreme case of hidden knowledge. When the ground-truth answer isn’t sampled after 1,000 attempts, manually adding it to the set of candidate answers leads to a substantial increase in knowledge scores.
11/🧵

March 31, 2025 at 6:34 PM

Zorik Gekhman

@zorikgekhman.bsky.social

Our results indicate that LLMs consistently exhibit hidden knowledge, with an average relative gap of 40%.

This highlights the need to understand these differences and build models that better use their knowledge, for which our framework serves as a foundation.

10/🧵

March 31, 2025 at 6:34 PM

Zorik Gekhman

@zorikgekhman.bsky.social

We then compare internal and external knowledge.

Internal knowledge is measured using a linear probing classifier to score candidate answers, while external knowledge is measured using standard methods that rely on the model’s observable token-level probabilities.

9/🧵

March 31, 2025 at 6:34 PM

Zorik Gekhman

@zorikgekhman.bsky.social

In our study, we estimate the set of (correct, incorrect) answer pairs per question using 1,000 model-generated answers, labeled for correctness by an LLM judge that compares each answer to the ground truth.
8/🧵

March 31, 2025 at 6:34 PM

Zorik Gekhman

@zorikgekhman.bsky.social

We define hidden knowledge as the condition in which internal knowledge exceeds external knowledge.
7/🧵

March 31, 2025 at 6:34 PM

Zorik Gekhman

@zorikgekhman.bsky.social

This allows us to measure internal and external knowledge using the same definition. We just use different scoring methods: external ones, that rely only on the model's observable token-level probabilities, and internal ones, that can use intermediate computations.
6/🧵

March 31, 2025 at 6:34 PM

Zorik Gekhman

@zorikgekhman.bsky.social

We propose to measure knowledge relative to a function that scores answer candidates using signals from the model, and we formalize knowledge of a question as the fraction of correct-incorrect answer pairs where the correct one is scored higher.
5/🧵

March 31, 2025 at 6:33 PM

Zorik Gekhman

@zorikgekhman.bsky.social

To define hidden knowledge, we first need a definition of “knowledge”, which is also not well defined for LLMs, as shown by @constanzafierro.bsky.social.
4/🧵

March 31, 2025 at 6:33 PM

Zorik Gekhman

@zorikgekhman.bsky.social

Yet, despite its importance from both practical and interpretability perspectives, hidden knowledge hasn't been clearly defined and measured.

We propose such a definition, laying foundations for studying this concept, and use it in a study to demonstrate hidden knowledge.
3/🧵

March 31, 2025 at 6:33 PM

Zorik Gekhman

@zorikgekhman.bsky.social

📜 arxiv.org/abs/2503.15299

Diverse evidence from prior work suggests the existence of hidden knowledge.
2/🧵

Inside-Out: Hidden Factual Knowledge in LLMs

This work presents a framework for assessing whether large language models (LLMs) encode more factual knowledge in their parameters than what they express in their outputs. While a few studies hint at...

arxiv.org

March 31, 2025 at 6:33 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news