Melanie Mitchell
melaniemitchell.bsky.social
Melanie Mitchell
@melaniemitchell.bsky.social
Professor, Santa Fe Institute. Research on AI, cognitive science, and complex systems.

Website: https://melaniemitchell.me

Substack: https://aiguide.substack.com/
I appreciate your overall point, but just to push back on "this is such a new field" -- AI as a field has been around for at least 70 years, and universities have been training AI researchers for most of that time.
November 7, 2025 at 10:13 PM
This is not to say that such "role-playing" can't be dangerous in and of itself. In fact, role-playing is a key method for AI "jail-breaking". But that's not the same thing as a "survival drive".
October 25, 2025 at 5:15 PM
It's a difficult and uncertain time for science in the U.S. and worldwide, so communicating the ideas and results of science to the general public has never been more essential.

More about these awards: www.nationalacademies.org/news/2025/10...
www.nationalacademies.org
October 23, 2025 at 3:36 PM
Half the authors might be hallucinations.
October 18, 2025 at 9:17 PM
Lol. Who among us hasn't hallucinated in the course of a Google Docs ➡️ LaTeX migration?
October 18, 2025 at 9:09 PM
This one too? URL links to paper with similar-sounding title, some different authors, different journal. Title in this reference does not seem to exist.
October 18, 2025 at 7:07 PM
Thanks, I will take a look.
October 17, 2025 at 1:06 PM
Megan, That's wonderful -- congratulations!!
October 13, 2025 at 5:11 PM
Evaluation of reasoning, and reasoning about evaluation -- both understudied, imo
October 7, 2025 at 7:40 PM
Excellent statement.
October 7, 2025 at 12:33 AM
On the other hand, accuracy alone may be *underestimating* this ability in visual settings

It is essential to go beyond accuracy in evaluating such capabilities!

Paper: arxiv.org/abs/2510.02125

Blog post: aiguide.substack.com/p/do-ai-reas...

🧵 10/10
Do AI Reasoning Models Abstract and Reason Like Humans?
Going beyond simple accuracy for evaluating abstraction abilities
aiguide.substack.com
October 6, 2025 at 9:27 PM
Conclusions: Evaluations like those of the ARC Prize, using accuracy alone, may be *overestimating* abstract reasoning ability of these models in textual setting.

🧵 9/10
October 6, 2025 at 9:27 PM
With visual inputs, these models all do quite poorly on generating accurate grids. But they do manage to get the correct-intended rule considerably more often than they generate the correct grid.

🧵 8/10
October 6, 2025 at 9:27 PM