(4/4)
(4/4)
(3/4)
(3/4)
(2/4)
(2/4)
SoTA LLMs have a perfect grasp of the physical concepts' definitions and a solid low-level understanding of grid inputs. Yet, the ~40% gap highlights a fundamental difference in abstract pattern understanding between humans and LLMs.
(5/5)
SoTA LLMs have a perfect grasp of the physical concepts' definitions and a solid low-level understanding of grid inputs. Yet, the ~40% gap highlights a fundamental difference in abstract pattern understanding between humans and LLMs.
(5/5)
(4/5)
(4/5)
1️⃣ Human vs. LLM Performance: Humans achieve ~90% accuracy, while top LLMs, including reasoning models (e.g., o1) and vision-language models (e.g., GPT-4o), lag by ~40%.
(3/5)
1️⃣ Human vs. LLM Performance: Humans achieve ~90% accuracy, while top LLMs, including reasoning models (e.g., o1) and vision-language models (e.g., GPT-4o), lag by ~40%.
(3/5)
(2/5)
(2/5)