Part of UT Computational Linguistics https://sites.utexas.edu/compling/ and UT NLP https://www.nlp.utexas.edu/
PLSemanticsBench - where formal meets informal!
arxiv.org/abs/2510.03415
Team: Aditya Thimmaiah, Jiyang Zhang, Jayanth Srinivasa, Milos Gligoric
PLSemanticsBench - where formal meets informal!
arxiv.org/abs/2510.03415
Team: Aditya Thimmaiah, Jiyang Zhang, Jayanth Srinivasa, Milos Gligoric
LLMs aren't interpreting rules -- they're recalling patterns.
Their "understanding" is promising... but shallow.
💡It's time to test semantics, not just syntax.💡
To move from surface-level memorization → true symbolic reasoning.
LLMs aren't interpreting rules -- they're recalling patterns.
Their "understanding" is promising... but shallow.
💡It's time to test semantics, not just syntax.💡
To move from surface-level memorization → true symbolic reasoning.
Models that were "near-perfect" drop to single digits. 😬
Models that were "near-perfect" drop to single digits. 😬
A new benchmark developed by researchers at the NSF-Simons AI Institute for Cosmic Origins is testing how well LLMs implement scientific workflows in astronomy and visualize results.