We introduce a large-scale dataset of programmatically verified FOL reasoning traces for studying structured logical inference + process fidelity.
Happy to hear thoughts from others working on reasoning in LLMs!
Check it out here 👇
What is (correct) reasoning in LLMs? How do you rigorously define/measure process fidelity? How might we study its acquisition in large scale training? We made a gigantic, verifiably correct reasoning traces of first order logic expressions!
1/9
We introduce a large-scale dataset of programmatically verified FOL reasoning traces for studying structured logical inference + process fidelity.
Happy to hear thoughts from others working on reasoning in LLMs!
Check it out here 👇