Madison May
pragmaticml.bsky.social
Madison May
@pragmaticml.bsky.social
professional novice
Reposted by Madison May
Anthropic shipped a new "web search" feature for their Claude consumer apps today, here are my notes - it's frustrating that they don't share details on whether the underlying index is their own or run by a partner simonwillison.net/2025/Mar/20/...
Claude can now search the web
Claude 3.7 Sonnet on the paid plan now has a web search tool that can be turned on as a global setting. This was sorely needed. ChatGPT, Gemini and Grok …
simonwillison.net
March 20, 2025 at 7:41 PM
Reposted by Madison May
Some people frequently (rightly) point out that AIs make mistakes and are not fully reliable. Indeed, hallucinations may never completely be solved.

But I am not sure that matters much. Larger models already make far less errors & many real world processes are built with error-prone humans in mind.
March 20, 2025 at 1:17 AM
Reposted by Madison May
Want to check out the source for the "AlexNet" paper? Google has made the code from Krizhevsky, Sutskever and Hinton's seminal "ImageNet Classification with Deep Convolutional
Neural Networks" paper open source, in partnership with the Computer History Museum.

computerhistory.org/press-releas...
March 20, 2025 at 9:02 PM
If the latest and greatest LLMs aren't effective on your codebase, it may not be the LLMs that are the problem
February 27, 2025 at 1:39 AM
Reposted by Madison May
If you regularly work with math, I can't recommend trying out Corca enough. Corca is a beautiful collaborative math editor, dubbed 'Figma for math,' built by a team that deeply cares about math, science, and their product.
corca.io
corca.io Corca @corca.io · Jan 30
January 30, 2025 at 4:09 PM
Reposted by Madison May
We'll be hosting weekly office hours on our Discord server! Our developer relations engineer Cameron will be there to answer questions, talk about AI engineering, and generally chat about what you're building.

Come see us on Tuesday mornings at 8am PST!
Join the .txt Discord Server!
Check out the .txt community on Discord - hang out with 1407 other members and enjoy free voice and text chat.
buff.ly
January 30, 2025 at 10:25 PM
Reposted by Madison May
Whoah.. sonnet was *not* distilled

"3.5 Sonnet was not trained in any way that involved a larger or more expensive model (contrary to some rumors)."

—Dario Amodei

darioamodei.com/on-deepseek-...
January 29, 2025 at 4:54 PM
Reposted by Madison May
These four points on DeepSeek seem very likely correct and important to understand about the economics of building AI models and what DeepSeek actually did, from the CEO of Anthropic. darioamodei.com/on-deepseek-...
January 29, 2025 at 4:56 PM
Reposted by Madison May
About to submit some of the most bonkers papers I've ever been involved in to ICML. It has taken years to get here but I'm so excited...
January 29, 2025 at 4:08 PM
Reposted by Madison May
Published some notes on Dario Amodei's new essay on DeepSeek, mainly to highlight some new-to-me details he included about Claude 3.5 Sonnet

simonwillison.net/2025/Jan/29/...
January 29, 2025 at 9:41 PM
Great read from Dario Amodei on what aspects of DeepSeek's R1 release are most significant:

darioamodei.com/on-deepseek-...
Dario Amodei — On DeepSeek and Export Controls
On DeepSeek and Export Controls
darioamodei.com
January 29, 2025 at 11:49 PM
Reposted by Madison May
reminder: claude has been thinking for a while. we may never see an explicit reasoning model from anthropic, their CEO has been open about this (2024-09-05) www.interconnects.ai/p/openai-str...
OpenAI’s Strawberry and inference scaling laws
OpenAI’s Strawberry, LM self-talk, inference scaling laws, and spending more on inference. Coming waves in LLMs.
www.interconnects.ai
January 28, 2025 at 10:23 PM
One downside of R1 / O1 -- they take just enough time I'm likely to context switch and come back later. At 30s or less I might as well wait around for the result, but ~2 mins is an awkward amount of time.
January 29, 2025 at 1:48 AM
Reposted by Madison May
DeepSeek R1 appears to be a VERY strong model for coding - examples for both C and Python here: simonwillison.net/2025/Jan/27/...
ggml : x2 speed for WASM by optimizing SIMD
PR by Xuan-Son Nguyen for `llama.cpp`: > This PR provides a big jump in speed for WASM by leveraging SIMD instructions for `qX_K_q8_K` and `qX_0_q8_0` dot product functions. > > …
simonwillison.net
January 27, 2025 at 6:33 PM
Reposted by Madison May
Why reasoning models will generalize
DeepSeek R1 is just the tip of the ice berg of rapid progress.
People underestimate the long-term potential of “reasoning.”
Why reasoning models will generalize
People underestimate the long-term potential of “reasoning.”
buff.ly
January 28, 2025 at 9:04 PM
Reposted by Madison May
OpenAI's Canvas feature got a big upgrade today, turning it into a direct competitor for Anthropic's excellent Claude Artifacts feature - my notes here: simonwillison.net/2025/Jan/25/...
OpenAI Canvas gets a huge upgrade
[Canvas](https://openai.com/index/introducing-canvas/) is the ChatGPT feature where ChatGPT can open up a shared editing environment and collaborate with the user on creating a document or piece of co...
simonwillison.net
January 25, 2025 at 1:26 AM
If you don't notice the difference between GPT-4o and o1-pro, you're probably not asking specific enough questions
January 25, 2025 at 7:41 PM
Reposted by Madison May
I am deeply worried by the withdrawal of the US from the World Health Organization. I worked at WHO for ~2 years at WHO's Global Programme on AIDS, a worldwide response to the HIV pandemic & international cooperation was critical. The US should not withdraw from WHO's global health cooperation.
January 21, 2025 at 3:55 AM
Reposted by Madison May
I’m thrilled to share that I’ve finished my Ph.D. at Mila and Polytechnique Montreal. For the last 4.5 years, I have worked on creating new faithfulness-centric paradigms for NLP Interpretability. Read my vision for the future of interpretability in our new position paper: arxiv.org/abs/2405.05386
Interpretability Needs a New Paradigm
Interpretability is the study of explaining models in understandable terms to humans. At present, interpretability is divided into two paradigms: the intrinsic paradigm, which believes that only model...
arxiv.org
November 28, 2024 at 1:39 PM
Reposted by Madison May
This would’ve been useful when I wrote that rock climbing post github.com/tpvasconcelo...
November 28, 2024 at 4:09 PM