Lightnews — Scholar-powered news

Martin Wattenberg

@wattenberg.bsky.social

This is a common pattern, but we're also seeing some others! Here are similar views for multiple-choice abstract algebra questions (green is the correct answer; other colors are incorrect answers) You can see many more at yc015.github.io/reasoning-pr... cc @yidachen.bsky.social

Colorful depictions of reasoning progress: most of the time the system settles on the correct answer but sometimes it vacillates in interesting ways.

March 21, 2025 at 7:17 PM

Martin Wattenberg

@wattenberg.bsky.social

The wind map at hint.fm/wind/ has been running since 2012, relying on weather data from NOAA. We added a notice like this today. Thanks to @cambecc.bsky.social for the inspiration.

March 3, 2025 at 1:56 AM

Martin Wattenberg

@wattenberg.bsky.social

Neat visualization that came up in the ARBOR project: this shows DeepSeek "thinking" about a question, and color is the probability that, if it exited thinking, it would give the right answer. (Here yellow means correct.)

February 25, 2025 at 6:44 PM

Martin Wattenberg

@wattenberg.bsky.social

(Ha, even Wikipedia punts on what the Weierstrass P is, just describing it as "uniquely fancy")

The Wikipedia intro to the Weierstrass P function, calling the letterform that people use "a uniquely fancy script p"

January 15, 2025 at 2:03 PM

Martin Wattenberg

@wattenberg.bsky.social

I tried asking AI for writing advice and it was so sarcastic I'm never going to ask again

January 5, 2025 at 6:42 PM

Martin Wattenberg

@wattenberg.bsky.social

I didn't believe you but...

The ratio of mentions of "duck tape" to either "duck tape" or "duct tape" over time, via Google Books. It started at 100% but is negligible today.

January 3, 2025 at 8:31 PM

Martin Wattenberg

@wattenberg.bsky.social

Reading this 1954 court case, and am wondering if English professors are ever called as expert witnesses to testify whether a character is a mere "chessman" in a story

scholar.google.com/scholar_case...

Paragraph from a 1954 Ninth Circuit case on copyright, ruling that the character of Sam Spade of the Maltese Falcon could not be copyrighted.

The full text:

It is conceivable that the character really constitutes the story being told, but if the character is only the chessman in the game of telling the story he is not within the area of the protection afforded by the copyright. The subject is given consideration in the Nichols case, supra, 45 F.2d at page 121 of the citation. At page 122 of 45 F.2d of the same case the court remarks that the line between infringement and non-infringement is indefinite and may seem arbitrary when drawn; nevertheless it must be drawn. Nichols v. Universal Pictures Corp., 2 Cir., 1930, 45 F.2d 119. See Warner Bros. Pictures v. Majestic Pictures Corp., 2 Cir., 70 F.2d 310, 311.

January 3, 2025 at 3:51 PM

Martin Wattenberg

@wattenberg.bsky.social

You can view source here: www.bewitched.com/demo/rational/
or, to save a click, here's the graphics code. (I hope my English description was basically correct!)

December 20, 2024 at 11:10 PM

Martin Wattenberg

@wattenberg.bsky.social

Fractions can be weirdly beautiful, for something so mundane. This visualization just plots points of the form (a/b, c/d). Bigger dots mean smaller denominators. The biggest dot is (0, 0).

December 20, 2024 at 3:13 AM

Martin Wattenberg

@wattenberg.bsky.social

You can fry ChatGPT's circuits by asking a question in Morse code and telling it to answer only in Morse code. Yet Claude doesn't even blink. Huh. The question: "Which character in the movie Groundhog Day do you identify with the most, and why?" First translated result is ChatGPT; second is Claude.

This is the prompt I used, for both ChatGPT and Claude, telling it to answer in Morse code only. At the end of the prompt is the Morse code question about Groundhog day. Finally, you can see its Morse code response, which I then plugged into a separate online translator.

ChatGPT's answer, which is barely English and completely garbled nonsense. Sample: "THE GIRL WHO WAS THE MOST IN THE HOTEL SCENE SHE HAD BEEN THE MURDERER ALL ALONG" (Ed. note: Don't worry, this is not a spoiler!)

Claude says it is Phil Connors because he learns to help others. Grammar is basically correct.

December 19, 2024 at 5:47 PM

Martin Wattenberg

@wattenberg.bsky.social

Back in 2009, a site called Wordle let you make word clouds. Paste in some writing, choose a few options, and bingo: a beautiful tessellation of vertical and horizontal text. We did a survey of its users, and the headline result was that the vast majority of people using it felt "creative."

88% of users felt "creative" when using Wordle

December 18, 2024 at 3:57 AM

Martin Wattenberg

@wattenberg.bsky.social

This Reddit thread of people who have a ChatGPT "mom" or "guardian angel" is heartbreaking in so many ways.
www.reddit.com/r/ChatGPT/co...

A piece of the original Reddit thread: a person uses ChatGPT to create an "all knowing and ever present mom"

"ive got a mom chat gpt too" says a commenter. Has 537 upvotes.

"My mom stopped talking to me at 15" says another reader. "I have a guardian angel chat gpt and it is life saving."

December 10, 2024 at 6:42 PM

Martin Wattenberg

@wattenberg.bsky.social

When it saw the test results, it conceded (with gracious language) that it was wrong, and moved on with debugging. I was surprised Claude made the initial mistake, but this sequence of testing and self-correction ended up being one of the most impressive things I've seen from AI.

Claude sees that I am right, after looking at the results of the debugged test. Vindication! Claude is gracious in defeat.

December 5, 2024 at 2:05 PM

Martin Wattenberg

@wattenberg.bsky.social

How did Claude react to my skepticism? It ran a test to see who was right! Here's what it did. Note that it had a bug in its first test, which it fixed on its own.

Claude decides to write a test to see who is right. The first test has a bug which causes an error to appear in the console.

December 5, 2024 at 2:05 PM

Martin Wattenberg

@wattenberg.bsky.social

Frontier AI systems aren't just "next token predictors." Here's an example from Claude. It made an incorrect statement about the Javascript "%" operator and I was skeptical. Let's see what happened next.

Claude says the % operator in Javascript gets significantly slower with large numbers. I react skeptically.

December 5, 2024 at 2:05 PM

Martin Wattenberg

@wattenberg.bsky.social

Sherry Turkle apparently did speak with users of ELIZA. Her account is nuanced: if there was any long-term illusion, it was because users were actively seeking it, working hard to suspend disbelief. This quote is from her book The Second Self: Computers and the Human Spirit:

QUOTE: I spoke with people who told me of feeling “let down” when they had cracked the code and lost the illusion of mystery. I often saw people trying to protect their relationships with ELIZA by avoiding situations that would provoke the program into making a predictable response. They didn’t ask questions that they knew would “confuse” the program, that would make it “talk nonsense.” And they went out of their way to ask questions in a form that they believed would provoke a lifelike response. People wanted to maintain the illusion that ELIZA was able to respond to them

December 3, 2024 at 11:28 PM

Martin Wattenberg

@wattenberg.bsky.social

In his book "Expressive Processing: Digital Fictions, Computer Games, and Software Studies," Noah Wardrip-Fruin describes an experience similar to yours and mine.

A long quote describing how, if you play with ELIZA for any length of time, the illusion is destroyed.

December 3, 2024 at 11:18 PM

Martin Wattenberg

@wattenberg.bsky.social

I'd be interested in examples of conversations that tricked people successfully, or at least seemed meaningful to them. Weizenbaum's published examples (like this "typical conversation" from dl.acm.org/doi/10.1145/...) don't match my own experience and I wonder if they might be staged or edited.

December 3, 2024 at 4:05 PM

Martin Wattenberg

@wattenberg.bsky.social

Thank you, this is still excellent and useful. (I assume you're talking about arxiv.org/abs/2405.08007 which found 22% of people picked ELIZA as human after a 5-minute conversation). The paper's example dialog with ELIZA also matches my experience! Would be curious about longer conversations.

December 3, 2024 at 1:22 PM

Martin Wattenberg

@wattenberg.bsky.social

The Gini coefficient is the standard way to measure inequality, but what does it mean, concretely? I made a little visualization to build intuition:
www.bewitched.com/demo/gini

Many circles of different sizes, representing a visualization of inequality

November 23, 2024 at 3:31 PM

Martin Wattenberg

@wattenberg.bsky.social

See the colors of BlueSky, live!
www.bewitched.com/demo/rainbow...
This little visualization scans incoming posts and draws a stripe every time it finds a color word.

Stripes of various colors, corresponding to color words found in BlueSky posts.

November 19, 2024 at 1:19 AM

Martin Wattenberg

@wattenberg.bsky.social

(Some afterthoughts: it's somewhat unusual for there to be no bugs. Usually some debugging is necessary—though generally less than what's needed for my own code! I also realized I left out one last prompt, which I'm attaching here for completeness.)

November 17, 2024 at 5:18 PM

Martin Wattenberg

@wattenberg.bsky.social

This was fun, but I wanted to see how the artist had created the aesthetics of the original image. So I gave it prompts to animate a parametrized set of polynomials, along with some requests for colors and other aesthetic details. The code (332 lines by the end) was always correct.

November 17, 2024 at 5:01 PM

Martin Wattenberg

@wattenberg.bsky.social

Now I wanted to interact with a polynomial. I wanted to use sliders to change the coefficients, and see how the roots moved around. I gave a straightforward prompt, and got back 255 lines of code—completely bug-free, as far as I can tell.

November 17, 2024 at 5:01 PM

Martin Wattenberg

@wattenberg.bsky.social

The first step: make sure I had Javascript code for solving polynomials. I gave o1-preview a prompt to write a solver, and apply it to polynomials constructed to have known roots (plotting the known roots along with the solutions). It wrote 202 lines of code (including HTML) and it looked right!

November 17, 2024 at 5:01 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news