In a new paper, we provide initial evidence that it does! GPT 4.1 and Claude 3.5 describe three synthetic datasets more precisely and accurately when raw data is accompanied by a scatter plot. Read more in🧵!
In a new paper, we provide initial evidence that it does! GPT 4.1 and Claude 3.5 describe three synthetic datasets more precisely and accurately when raw data is accompanied by a scatter plot. Read more in🧵!
www.theatlantic.com/technology/a...
www.theatlantic.com/technology/a...
📝 Blog post: www.anthropic.com/research/tra...
🧪 "Biology" paper: transformer-circuits.pub/2025/attribu...
⚙️ Methods paper: transformer-circuits.pub/2025/attribu...
Featuring basic multi-step reasoning, planning, introspection and more!
See the ARBOR discussion board for a thread for each project underway.
github.com/ArborProjec...
github.com/ARBORproject...
(ARBOR = Analysis of Reasoning Behavior through Open Research)
github.com/ARBORproject...
(ARBOR = Analysis of Reasoning Behavior through Open Research)
In: rectangular maps of butterfly wings!
In: rectangular maps of butterfly wings!
It wasn't for certain whether it would survive it's closest approach to the sun on January 13th, but it did and delivered us a spectacular show!
#comet #C2024G3 🔭
We already use every symbol on the keyboard, musical sharps and flats, and even weird made-up fonts (what is that Weierstrass P??). A smiley is easy to draw with chalk and put into LaTeX, so why not?
We already use every symbol on the keyboard, musical sharps and flats, and even weird made-up fonts (what is that Weierstrass P??). A smiley is easy to draw with chalk and put into LaTeX, so why not?
Interested in inference-time scaling? In-context Learning? Mech Interp?
LMs can solve novel in-context tasks, with sufficient examples (longer contexts). Why? Bc they dynamically form *in-context representations*!
1/N
Interested in inference-time scaling? In-context Learning? Mech Interp?
LMs can solve novel in-context tasks, with sufficient examples (longer contexts). Why? Bc they dynamically form *in-context representations*!
1/N
scholar.google.com/scholar_case...
scholar.google.com/scholar_case...
www.penguinrandomhouse.com/books/602064...
www.penguinrandomhouse.com/books/602064...
1. OopsBench: given a faulty proof with numbered steps, which step contains an unfixable logical flaw?
2. DunnoMath: half the problems are taken from FrontierMath, half are almost certainly unsolvable. Major points off for guessing an answer to an unsolvable problem.
1. OopsBench: given a faulty proof with numbered steps, which step contains an unfixable logical flaw?
2. DunnoMath: half the problems are taken from FrontierMath, half are almost certainly unsolvable. Major points off for guessing an answer to an unsolvable problem.