Fri, Dec 5, 2025
11:00 AM – 2:00 PM PST
Exhibit Hall C,D,E #4505
Pic: (fancy) knots at USS midway museum near SD convention center
Fri, Dec 5, 2025
11:00 AM – 2:00 PM PST
Exhibit Hall C,D,E #4505
Pic: (fancy) knots at USS midway museum near SD convention center
Knots are simple to see but deep to reason about.
✔ Verifiable outcomes
✔ Structured complexity (crossing number # X)
✔ A ladder of difficulty for generalization
Perfect for studying long-horizon visual reasoning and test-time scaling in visual space.
Knots are simple to see but deep to reason about.
✔ Verifiable outcomes
✔ Structured complexity (crossing number # X)
✔ A ladder of difficulty for generalization
Perfect for studying long-horizon visual reasoning and test-time scaling in visual space.
We’re great at evaluating text-based reasoning (MATH, AIME…) but what about long-horizon visual reasoning?
Enter 𝗞𝗻𝗼𝘁𝗚𝘆𝗺: a minimalistic testbed for evaluating agents on spatial reasoning along a difficulty ladder
We’re great at evaluating text-based reasoning (MATH, AIME…) but what about long-horizon visual reasoning?
Enter 𝗞𝗻𝗼𝘁𝗚𝘆𝗺: a minimalistic testbed for evaluating agents on spatial reasoning along a difficulty ladder
Do you think AIs today are intelligent? Answer with yes or no.
Here is the break down:
Yes: 57
No: 62
Total: 119
Pretty close!
Do you think AIs today are intelligent? Answer with yes or no.
Here is the break down:
Yes: 57
No: 62
Total: 119
Pretty close!
7/7
7/7
4/7
4/7
→ Prompt the same LLM that does the task (really bad early on) with a task-independent prompt
→ LLM bootstraps itself
3/7
→ Prompt the same LLM that does the task (really bad early on) with a task-independent prompt
→ LLM bootstraps itself
3/7
@yoavartzi.com: how about the paper’s fig1? 🙅
me: lesson learned. no memes 😭
A paper on continually learning from naturally occurring interaction signals, such as in the hypothetical conversation above
arxiv.org/abs/2410.13852
1/7
@yoavartzi.com: how about the paper’s fig1? 🙅
me: lesson learned. no memes 😭
A paper on continually learning from naturally occurring interaction signals, such as in the hypothetical conversation above
arxiv.org/abs/2410.13852
1/7