Zizhao Chen
ch272h.bsky.social
Zizhao Chen
@ch272h.bsky.social
chenzizhao.github.io unlearning natural stupidity
🧩Natural language isn’t all you need.

We’re great at evaluating text-based reasoning (MATH, AIME…) but what about long-horizon visual reasoning?

Enter 𝗞𝗻𝗼𝘁𝗚𝘆𝗺: a minimalistic testbed for evaluating agents on spatial reasoning along a difficulty ladder
December 5, 2025 at 5:13 PM