Sanjeev Arora
profsanjeevarora.bsky.social
Sanjeev Arora
@profsanjeevarora.bsky.social
Director, Princeton Language and Intelligence. Professor of CS.
Congratulations ! great result.
February 27, 2025 at 5:25 PM
Interesting thread from Geoffrey Irving about the fragility of interpreting LLMs' latent reasoning (whether self-reported, or recovered by some mechanistic interpretability idea). I have been pessimistic about trusting latent reasoning.
November 25, 2024 at 2:50 PM