interpretability & training & reasoning
iglee.me
dataset: huggingface.co/datasets/fo...
work w/ @sarahliaw.bsky.social and Dani Yogatama
If you want to chat about interpretability & training dynamics & reasoning and munch on mezzes, come hang out with me in Rabat 🇲🇦🙃
9/9
dataset: huggingface.co/datasets/fo...
work w/ @sarahliaw.bsky.social and Dani Yogatama
If you want to chat about interpretability & training dynamics & reasoning and munch on mezzes, come hang out with me in Rabat 🇲🇦🙃
9/9
8/9
8/9
(n.b. since FOL is verifiable, we define correct as any generation that's equivalent to expression.)
7/9
(n.b. since FOL is verifiable, we define correct as any generation that's equivalent to expression.)
7/9
6/9
6/9
Let's consider an example w/ de Morgan's law: ¬(¬Sunny(x) ∧ Breezy(x)) ↔ (Sunny(x) ∨ Breezy(x))
5/9
Let's consider an example w/ de Morgan's law: ¬(¬Sunny(x) ∧ Breezy(x)) ↔ (Sunny(x) ∨ Breezy(x))
5/9
1. programmatically, randomly generate a bunch of FOL expressions
2. progressively simplify them, verifying their equivalence
3. chain them together
4. NL instantiate them w/ LLMs
4/9
1. programmatically, randomly generate a bunch of FOL expressions
2. progressively simplify them, verifying their equivalence
3. chain them together
4. NL instantiate them w/ LLMs
4/9
3/9
3/9
2/9
2/9