Martin Wattenberg | Lightnews

vrli.bsky.social @vrli.bsky.social · Aug 4

Charts and graphs help people analyze data, but can they also help AI?

In a new paper, we provide initial evidence that it does! GPT 4.1 and Claude 3.5 describe three synthetic datasets more precisely and accurately when raw data is accompanied by a scatter plot. Read more in🧵!

Baby Name Wizard author Laura Wattenberg @babynames.bsky.social · Dec 10

The 2024 Name of the Year sounds goofy, until it doesn't.

The 2024 Name of the Year is Shaboozey : Namerology

The name behind this year's biggest song isn't as simple as it seems.

bit.ly

Kyle McDonald @kcimc.bsky.social · Dec 12

i'm building an experimental tool for exploring 25 years of my old sketchbooks, with image and text recognition powered by gemini

tal yarkoni @talyarkoni.com · Dec 16

i asked Claude to write a Barthelme-esque short story with the aesthetic sensibilities of "The School", and it gave me this. i mean. i mean.

For six months we watched the pigeons building their civilization on top of the skyscrapers. First came the architecture: nests made not just of twigs and paper, but of lost earbuds, expired credit cards, and the tiny silver bells from cat collars. Then came their laws.

"They have a supreme court," said Dr. Fernandez, who'd been studying them since the beginning. "Nine pigeons who sit on the ledge of the Chrysler Building and coo about justice." We didn't believe her at first, but then we didn't believe a lot of things that turned out to be true.

The pigeons developed a currency based on blue bottle caps. They established schools where young pigeons learned to dodge taxi cabs and identify the most generous hot dog vendors. Some of us tried to join their society, climbing to rooftops with offerings of breadcrumbs and philosophy textbooks, but the pigeons regarded us with the kind of pity usually reserved for very small children or very old cats.

"They're planning something," the conspiracy theorists said, but they always say that. Still, we noticed the pigeons holding what looked like town halls, thousands of them gathered on the roof of the public library, bobbing their heads in what might have been voting or might have been prayer.

Our own civilization continued below theirs. We went to work, fell in love, lost keys, found keys, forgot anniversaries, remembered too late, all while the pigeons above us built something that looked suspiciously like a scaled-down replica of the United Nations building out of discarded takeout containers and stolen Christmas lights.

Sometimes they dropped things on us: rejection letters for poetry we'd never submitted, tax returns from years that hadn't happened yet, photographs of ourselves sleeping that we couldn't explain. Dr. Fernandez said this was their way of communicating. We said Dr. Fernandez had been spending too much time on rooftops.

The pigeons started their own newspapers, printed on leaves that fell upward instead…

Cees Bassa @cbassa.bsky.social · Jan 4

For the 4th year in a row, my all-sky camera has been taking an image of the sky above the Netherlands every 15 seconds. Combining these images reveal the length of the night changing throughout the year, the passage of clouds and the motion of the Moon and the Sun through the sky. #astrophotography

This image shows the hourglass shape of the nights shortening and
lengthening from winter to summer and from summer to winter, passing
clouds, the Moon passing through the night sky every month (diagonal
bands) and the Sun climbing higher in the sky during summer.

The Public Domain Review @publicdomainrev.bsky.social · Jan 14

In 1897, Alfred G. Mayer created his butterfly wing projections, an attempt to gain new insights into natural patterns and laws. Vertical blocks denote individual wings, distorted and stretched mathematically to fill a tidy rectangular space. More here: publicdomainreview.org/collection/m...

David Bau @davidbau.bsky.social · Feb 20

ARBOR aims to accelerate the internal investigation of the new class of AI "reasoning" models.

See the ARBOR discussion board for a thread for each project underway.

github.com/ArborProjec...

Chris Olah @colah.bsky.social · Mar 27

Can we understand the mechanisms of a frontier AI model?

📝 Blog post: www.anthropic.com/research/tra...
🧪 "Biology" paper: transformer-circuits.pub/2025/attribu...
⚙️ Methods paper: transformer-circuits.pub/2025/attribu...

Featuring basic multi-step reasoning, planning, introspection and more!

On the Biology of a Large Language Model

transformer-circuits.pub

Olivia Vane @oliviafvane.bsky.social · May 12

This map shows the hour of sunrise globally through the year. It reveals time zones following national and, sometimes, regional boundaries, and slicing through the oceans.

Jonathan Zittrain @zittrain.bsky.social · May 21

AI is often thought of as a black box -- no way to know what's going on inside. That's changing in eye-opening ways. Researchers are finding "beliefs" models are forming as they converse, and how those beliefs correlate to what the models say and how they say it.

www.theatlantic.com/technology/a...

What AI Thinks It Knows About You

What happens when people can see what assumptions a large language model is making about them?

www.theatlantic.com

Baby Name Wizard author Laura Wattenberg @babynames.bsky.social · May 12

The interactive NameGrapher is updated with 2024 baby name popularity stats! Come explore--and marvel that Oliver and Olivia have converged namerology.com/baby-name-gr...

Historical popularity chart showing the popularity of Oliver rising to meet the previously much greater popularity of Olivia

Martin Wattenberg @wattenberg.bsky.social · May 12

A wonderful visualization for those of us obsessed by sunlight and geography!

Olivia Vane @oliviafvane.bsky.social · May 12

This map shows the hour of sunrise globally through the year. It reveals time zones following national and, sometimes, regional boundaries, and slicing through the oceans.

Martin Wattenberg @wattenberg.bsky.social · Mar 27

An incredibly rich, detailed view of neural net internals! There are so many insights in these papers. And the visualizations of "addition circuit" features are just plain cool!

Chris Olah @colah.bsky.social · Mar 27

Can we understand the mechanisms of a frontier AI model?

📝 Blog post: www.anthropic.com/research/tra...
🧪 "Biology" paper: transformer-circuits.pub/2025/attribu...
⚙️ Methods paper: transformer-circuits.pub/2025/attribu...

Featuring basic multi-step reasoning, planning, introspection and more!

On the Biology of a Large Language Model

transformer-circuits.pub

Martin Wattenberg @wattenberg.bsky.social · Mar 27

Great news, congrats! And glad you’ll still be in the neighborhood!

Martin Wattenberg @wattenberg.bsky.social · Mar 24

I'd be curious about advice on teaching non-coders how to test programs they've written with AI. I'm not thinking unit tests so much as things like making sure you can drill down for verifiable details in a visualization—basic practices that are good on their own, but also help catch errors.

Martin Wattenberg @wattenberg.bsky.social · Mar 24

Now that we have vibe coding, we need vibe testing!

Martin Wattenberg @wattenberg.bsky.social · Mar 21

Oh, that looks super relevant and fascinating, reading through it now...

Martin Wattenberg @wattenberg.bsky.social · Mar 21

Ha! I think (!) that for me, the word "calculate" connotes narrow precision and correctness, whereas "think" is more expansive but also implies more fuzziness and the possibility of being wrong. That said, your observation does give me pause!

Martin Wattenberg @wattenberg.bsky.social · Mar 21

We're following the terminology of the DeepSeek-R1 paper that introduced this model: arxiv.org/abs/2501.12948 Whether it's really the best metaphor is certainly worth asking! I can see pros and cons for both "thinking" and "calculating"

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT)...

arxiv.org

Martin Wattenberg @wattenberg.bsky.social · Mar 21

These are great questions! I believe there's at least one graph of p(correct answer) on the main Arbor discussion page, and generally there are a lot more details: github.com/ARBORproject...

Reasoning or Performing: locating "breakthrough" in the model's reasoning · ARBORproject arborproject.github.io · Discussion #11

Research Question When asked the DeepSeek models a challenging abstract algebra question, they often generated hundreds of tokens of reasoning before providing the final answer. Yet, on some questi...

github.com

Martin Wattenberg @wattenberg.bsky.social · Mar 21

Interesting question! I haven't calculated this, but @yidachen.bsky.social might know

Martin Wattenberg @wattenberg.bsky.social · Mar 21

This is a common pattern, but we're also seeing some others! Here are similar views for multiple-choice abstract algebra questions (green is the correct answer; other colors are incorrect answers) You can see many more at yc015.github.io/reasoning-pr... cc @yidachen.bsky.social

Colorful depictions of reasoning progress: most of the time the system settles on the correct answer but sometimes it vacillates in interesting ways.

Martin Wattenberg @wattenberg.bsky.social · Mar 13

Very cool! You're definitely not alone in finding this fascinating. If you're looking for other people interested in this kind of thing, drop by the Arbor Project page, if you haven't already. github.com/ArborProject...

GitHub - ARBORproject/arborproject.github.io

Contribute to ARBORproject/arborproject.github.io development by creating an account on GitHub.

github.com

Martin Wattenberg @wattenberg.bsky.social · Mar 3

The wind map at hint.fm/wind/ has been running since 2012, relying on weather data from NOAA. We added a notice like this today. Thanks to @cambecc.bsky.social for the inspiration.

Martin Wattenberg @wattenberg.bsky.social · Feb 26

It's based on a data set of multiple-choice questions that have a known right answer, so this visualization only works when you have labeled ground truth. Definitely wouldn't shock me if those answers were labeled by grad students, though!

Martin Wattenberg @wattenberg.bsky.social · Feb 25

Great questions! Maybe it would be faster... or maybe it's doing something important under the hood that we can't see? I genuinely have no idea.

Martin Wattenberg @wattenberg.bsky.social · Feb 25

We also see cases where it starts out with the right answer, but eventually "convinces itself" of the wrong answer! I would love to understand the dynamics better.

Martin Wattenberg @wattenberg.bsky.social · Feb 25

You can see the model go down the wrong path, "realize" it's not right, then find the correct answer! To see more visualizations, or if you have related ideas, join the discussion here!
github.com/ARBORproject... (vis by @yidachen.bsky.social in conversation with @diatkinson.bsky.social )

Reasoning or Performing · ARBORproject arborproject.github.io · Discussion #11

Research Question When asked the DeepSeek Distilled R1 models a challenging abstract algebra question, they often generated hundreds of tokens of CoT before providing the final answer. Yet, on some...

github.com

Martin Wattenberg @wattenberg.bsky.social · Feb 25

Neat visualization that came up in the ARBOR project: this shows DeepSeek "thinking" about a question, and color is the probability that, if it exited thinking, it would give the right answer. (Here yellow means correct.)

Martin Wattenberg @wattenberg.bsky.social · Feb 22

Thank you! That's a great write-up, and this is definitely an interesting experiment. The distinction between how the model might do parsing vs. solving is very much worth thinking about. I added a few thoughts on the wiki page. github.com/ARBORproject...

Chain of Thought for Tsumego (Go Life or Death) Problems

Contribute to ARBORproject/arborproject.github.io development by creating an account on GitHub.

github.com

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news