• Hidden knowledge: Sometimes, models internally know the right answer… but still generate the wrong one externally.
• Hidden knowledge: Sometimes, models internally know the right answer… but still generate the wrong one externally.
• Truth is token-specific: Truthfulness signals are concentrated in certain tokens — probing those can significantly boost error detection.
• Generalization is tough: These probing techniques don’t generalize across datasets, which means LLMs hold multiple fragmented notions of truth.
• Truth is token-specific: Truthfulness signals are concentrated in certain tokens — probing those can significantly boost error detection.
• Generalization is tough: These probing techniques don’t generalize across datasets, which means LLMs hold multiple fragmented notions of truth.