interpretability & training & reasoning
iglee.me
(n.b. since FOL is verifiable, we define correct as any generation that's equivalent to expression.)
7/9
(n.b. since FOL is verifiable, we define correct as any generation that's equivalent to expression.)
7/9
6/9
6/9
1. programmatically, randomly generate a bunch of FOL expressions
2. progressively simplify them, verifying their equivalence
3. chain them together
4. NL instantiate them w/ LLMs
4/9
1. programmatically, randomly generate a bunch of FOL expressions
2. progressively simplify them, verifying their equivalence
3. chain them together
4. NL instantiate them w/ LLMs
4/9
What is (correct) reasoning in LLMs? How do you rigorously define/measure process fidelity? How might we study its acquisition in large scale training? We made a gigantic, verifiably correct reasoning traces of first order logic expressions!
1/9
What is (correct) reasoning in LLMs? How do you rigorously define/measure process fidelity? How might we study its acquisition in large scale training? We made a gigantic, verifiably correct reasoning traces of first order logic expressions!
1/9
@nsaphra.bsky.social! We aim to predict potential AI model failures before impact--before deployment, using interpretability.
@nsaphra.bsky.social! We aim to predict potential AI model failures before impact--before deployment, using interpretability.