ttchungc.bsky.social
@ttchungc.bsky.social
Reposted
🚀 Introducing PhysiCo: A New Benchmark for Evaluating Abstract Understanding in LLMs! 🚀

📚Link: physico-benchmark.github.io

While models like o3 have made impressive strides on ARC-AGI, how well do LLMs truly grasp the abstract patterns in ARC-style tasks?

(1/5)
February 15, 2025 at 4:09 AM