Martin Wattenberg
@wattenberg.bsky.social
Human/AI interaction. ML interpretability. Visualization as design, science, art. Professor at Harvard, and part-time at Google DeepMind.
Great news, congrats! And glad you’ll still be in the neighborhood!
March 27, 2025 at 4:13 PM
Great news, congrats! And glad you’ll still be in the neighborhood!
I'd be curious about advice on teaching non-coders how to test programs they've written with AI. I'm not thinking unit tests so much as things like making sure you can drill down for verifiable details in a visualization—basic practices that are good on their own, but also help catch errors.
March 24, 2025 at 7:45 PM
I'd be curious about advice on teaching non-coders how to test programs they've written with AI. I'm not thinking unit tests so much as things like making sure you can drill down for verifiable details in a visualization—basic practices that are good on their own, but also help catch errors.
Oh, that looks super relevant and fascinating, reading through it now...
March 21, 2025 at 8:18 PM
Oh, that looks super relevant and fascinating, reading through it now...
Ha! I think (!) that for me, the word "calculate" connotes narrow precision and correctness, whereas "think" is more expansive but also implies more fuzziness and the possibility of being wrong. That said, your observation does give me pause!
March 21, 2025 at 8:09 PM
Ha! I think (!) that for me, the word "calculate" connotes narrow precision and correctness, whereas "think" is more expansive but also implies more fuzziness and the possibility of being wrong. That said, your observation does give me pause!
We're following the terminology of the DeepSeek-R1 paper that introduced this model: arxiv.org/abs/2501.12948 Whether it's really the best metaphor is certainly worth asking! I can see pros and cons for both "thinking" and "calculating"
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT)...
arxiv.org
March 21, 2025 at 8:00 PM
We're following the terminology of the DeepSeek-R1 paper that introduced this model: arxiv.org/abs/2501.12948 Whether it's really the best metaphor is certainly worth asking! I can see pros and cons for both "thinking" and "calculating"
These are great questions! I believe there's at least one graph of p(correct answer) on the main Arbor discussion page, and generally there are a lot more details: github.com/ARBORproject...
Reasoning or Performing: locating "breakthrough" in the model's reasoning · ARBORproject arborproject.github.io · Discussion #11
Research Question When asked the DeepSeek models a challenging abstract algebra question, they often generated hundreds of tokens of reasoning before providing the final answer. Yet, on some questi...
github.com
March 21, 2025 at 7:38 PM
These are great questions! I believe there's at least one graph of p(correct answer) on the main Arbor discussion page, and generally there are a lot more details: github.com/ARBORproject...
Interesting question! I haven't calculated this, but @yidachen.bsky.social might know
March 21, 2025 at 7:35 PM
Interesting question! I haven't calculated this, but @yidachen.bsky.social might know
This is a common pattern, but we're also seeing some others! Here are similar views for multiple-choice abstract algebra questions (green is the correct answer; other colors are incorrect answers) You can see many more at yc015.github.io/reasoning-pr... cc @yidachen.bsky.social
March 21, 2025 at 7:17 PM
This is a common pattern, but we're also seeing some others! Here are similar views for multiple-choice abstract algebra questions (green is the correct answer; other colors are incorrect answers) You can see many more at yc015.github.io/reasoning-pr... cc @yidachen.bsky.social
Very cool! You're definitely not alone in finding this fascinating. If you're looking for other people interested in this kind of thing, drop by the Arbor Project page, if you haven't already. github.com/ArborProject...
GitHub - ARBORproject/arborproject.github.io
Contribute to ARBORproject/arborproject.github.io development by creating an account on GitHub.
github.com
March 13, 2025 at 5:44 PM
Very cool! You're definitely not alone in finding this fascinating. If you're looking for other people interested in this kind of thing, drop by the Arbor Project page, if you haven't already. github.com/ArborProject...
It's based on a data set of multiple-choice questions that have a known right answer, so this visualization only works when you have labeled ground truth. Definitely wouldn't shock me if those answers were labeled by grad students, though!
February 26, 2025 at 1:02 AM
It's based on a data set of multiple-choice questions that have a known right answer, so this visualization only works when you have labeled ground truth. Definitely wouldn't shock me if those answers were labeled by grad students, though!
Great questions! Maybe it would be faster... or maybe it's doing something important under the hood that we can't see? I genuinely have no idea.
February 25, 2025 at 9:36 PM
Great questions! Maybe it would be faster... or maybe it's doing something important under the hood that we can't see? I genuinely have no idea.
We also see cases where it starts out with the right answer, but eventually "convinces itself" of the wrong answer! I would love to understand the dynamics better.
February 25, 2025 at 9:34 PM
We also see cases where it starts out with the right answer, but eventually "convinces itself" of the wrong answer! I would love to understand the dynamics better.
You can see the model go down the wrong path, "realize" it's not right, then find the correct answer! To see more visualizations, or if you have related ideas, join the discussion here!
github.com/ARBORproject... (vis by @yidachen.bsky.social in conversation with @diatkinson.bsky.social )
github.com/ARBORproject... (vis by @yidachen.bsky.social in conversation with @diatkinson.bsky.social )
Reasoning or Performing · ARBORproject arborproject.github.io · Discussion #11
Research Question When asked the DeepSeek Distilled R1 models a challenging abstract algebra question, they often generated hundreds of tokens of CoT before providing the final answer. Yet, on some...
github.com
February 25, 2025 at 6:44 PM
You can see the model go down the wrong path, "realize" it's not right, then find the correct answer! To see more visualizations, or if you have related ideas, join the discussion here!
github.com/ARBORproject... (vis by @yidachen.bsky.social in conversation with @diatkinson.bsky.social )
github.com/ARBORproject... (vis by @yidachen.bsky.social in conversation with @diatkinson.bsky.social )
Thank you! That's a great write-up, and this is definitely an interesting experiment. The distinction between how the model might do parsing vs. solving is very much worth thinking about. I added a few thoughts on the wiki page. github.com/ARBORproject...
Chain of Thought for Tsumego (Go Life or Death) Problems
Contribute to ARBORproject/arborproject.github.io development by creating an account on GitHub.
github.com
February 22, 2025 at 3:51 PM
Thank you! That's a great write-up, and this is definitely an interesting experiment. The distinction between how the model might do parsing vs. solving is very much worth thinking about. I added a few thoughts on the wiki page. github.com/ARBORproject...
Excellent idea! I just added an "observations" index page on the wiki for things like that.
github.com/ARBORproject...
github.com/ARBORproject...
Observations
Contribute to ARBORproject/arborproject.github.io development by creating an account on GitHub.
github.com
February 20, 2025 at 9:49 PM
Excellent idea! I just added an "observations" index page on the wiki for things like that.
github.com/ARBORproject...
github.com/ARBORproject...
Take a look at some initial research projects, and see if there's one you'd like to work on:
github.com/ARBORproject...
Or propose your own idea! There are many ways to contribute, and we welcome all of them.
github.com/ARBORproject...
Or propose your own idea! There are many ways to contribute, and we welcome all of them.
ARBORproject arborproject.github.io · Discussions
Explore the GitHub Discussions forum for ARBORproject arborproject.github.io. Discuss code, ask questions & collaborate with the developer community.
github.com
February 20, 2025 at 7:55 PM
Take a look at some initial research projects, and see if there's one you'd like to work on:
github.com/ARBORproject...
Or propose your own idea! There are many ways to contribute, and we welcome all of them.
github.com/ARBORproject...
Or propose your own idea! There are many ways to contribute, and we welcome all of them.