Lightnews — Scholar-powered news

Reposted by Madison May

Simon Willison

@simonwillison.net

Anthropic shipped a new "web search" feature for their Claude consumer apps today, here are my notes - it's frustrating that they don't share details on whether the underlying index is their own or run by a partner simonwillison.net/2025/Mar/20/...

Claude can now search the web

Claude 3.7 Sonnet on the paid plan now has a web search tool that can be turned on as a global setting. This was sorely needed. ChatGPT, Gemini and Grok …

simonwillison.net

March 20, 2025 at 7:41 PM

Reposted by Madison May

Ethan Mollick

@emollick.bsky.social

Some people frequently (rightly) point out that AIs make mistakes and are not fully reliable. Indeed, hallucinations may never completely be solved.

But I am not sure that matters much. Larger models already make far less errors & many real world processes are built with error-prone humans in mind.

March 20, 2025 at 1:17 AM

Reposted by Madison May

Jeff Dean

@jeffdean.bsky.social

Want to check out the source for the "AlexNet" paper? Google has made the code from Krizhevsky, Sutskever and Hinton's seminal "ImageNet Classification with Deep Convolutional
Neural Networks" paper open source, in partnership with the Computer History Museum.

computerhistory.org/press-releas...

March 20, 2025 at 9:02 PM

Madison May

@pragmaticml.bsky.social

If the latest and greatest LLMs aren't effective on your codebase, it may not be the LLMs that are the problem

February 27, 2025 at 1:39 AM

Reposted by Madison May

Petr Novikov

@petrnovikov.com

If you regularly work with math, I can't recommend trying out Corca enough. Corca is a beautiful collaborative math editor, dubbed 'Figma for math,' built by a team that deeply cares about math, science, and their product.
corca.io

Corca @corca.io · Jan 30

January 30, 2025 at 4:09 PM

Reposted by Madison May

.txt

@dottxtai.bsky.social

We'll be hosting weekly office hours on our Discord server! Our developer relations engineer Cameron will be there to answer questions, talk about AI engineering, and generally chat about what you're building.

Come see us on Tuesday mornings at 8am PST!

Join the .txt Discord Server!

Check out the .txt community on Discord - hang out with 1407 other members and enjoy free voice and text chat.

buff.ly

January 30, 2025 at 10:25 PM

Reposted by Madison May

Tim Kellogg

@timkellogg.me

Whoah.. sonnet was *not* distilled

"3.5 Sonnet was not trained in any way that involved a larger or more expensive model (contrary to some rumors)."

—Dario Amodei

darioamodei.com/on-deepseek-...

DeepSeek does not "do for $6M5 what cost US AI companies billions". I can only speak for Anthropic, but Claude 3.5 Sonnet is a mid-sized model that cost a few $10M's to train (I won't give an exact number). Also, 3.5 Sonnet was not trained in any way that involved a larger or more expensive model (contrary to some rumors). Sonnet's training was conducted 9-12 months ago, and DeepSeek's model was trained in November/December, while Sonnet remains notably ahead in many internal and external evals. Thus, I think a fair statement is "DeepSeek produced a model close to the performance of US models 7-10 months older, for a good deal less cost (but not anywhere near the ratios people have suggested)".

January 29, 2025 at 4:54 PM

Reposted by Madison May

Ethan Mollick

@emollick.bsky.social

These four points on DeepSeek seem very likely correct and important to understand about the economics of building AI models and what DeepSeek actually did, from the CEO of Anthropic. darioamodei.com/on-deepseek-...

January 29, 2025 at 4:56 PM

Reposted by Madison May

Eugene Vinitsky 🍒

@eugenevinitsky.bsky.social

About to submit some of the most bonkers papers I've ever been involved in to ICML. It has taken years to get here but I'm so excited...

January 29, 2025 at 4:08 PM

Reposted by Madison May

Simon Willison

@simonwillison.net

Published some notes on Dario Amodei's new essay on DeepSeek, mainly to highlight some new-to-me details he included about Claude 3.5 Sonnet

simonwillison.net/2025/Jan/29/...

Dario includes details about Claude 3.5 Sonnet that I've not seen shared anywhere before:

Claude 3.5 Sonnet cost "a few $10M's to train"
3.5 Sonnet "was not trained in any way that involved a larger or more expensive model (contrary to some rumors)" - I've seen those rumors, they involved Sonnet being a distilled version of a larger, unreleased 3.5 Opus.
Sonnet's training was conducted "9-12 months ago" - that would be roughly between January and April 2024. If you ask Sonnet about its training cut-off it tells you "April 2024" - that's surprising, because presumably the cut-off should be at the start of that training period?

January 29, 2025 at 9:41 PM

Madison May

@pragmaticml.bsky.social

Great read from Dario Amodei on what aspects of DeepSeek's R1 release are most significant:

darioamodei.com/on-deepseek-...

Dario Amodei — On DeepSeek and Export Controls

On DeepSeek and Export Controls

darioamodei.com

January 29, 2025 at 11:49 PM

Reposted by Madison May

Tim Kellogg

@timkellogg.me

reminder: claude has been thinking for a while. we may never see an explicit reasoning model from anthropic, their CEO has been open about this (2024-09-05) www.interconnects.ai/p/openai-str...

OpenAI’s Strawberry and inference scaling laws

OpenAI’s Strawberry, LM self-talk, inference scaling laws, and spending more on inference. Coming waves in LLMs.

www.interconnects.ai

January 28, 2025 at 10:23 PM

Madison May

@pragmaticml.bsky.social

One downside of R1 / O1 -- they take just enough time I'm likely to context switch and come back later. At 30s or less I might as well wait around for the result, but ~2 mins is an awkward amount of time.

January 29, 2025 at 1:48 AM

Reposted by Madison May

Simon Willison

@simonwillison.net

DeepSeek R1 appears to be a VERY strong model for coding - examples for both C and Python here: simonwillison.net/2025/Jan/27/...

ggml : x2 speed for WASM by optimizing SIMD

PR by Xuan-Son Nguyen for `llama.cpp`: > This PR provides a big jump in speed for WASM by leveraging SIMD instructions for `qX_K_q8_K` and `qX_0_q8_0` dot product functions. > > …

simonwillison.net

January 27, 2025 at 6:33 PM

Reposted by Madison May

Nathan Lambert

@natolambert.bsky.social

Why reasoning models will generalize
DeepSeek R1 is just the tip of the ice berg of rapid progress.
People underestimate the long-term potential of “reasoning.”

Why reasoning models will generalize

People underestimate the long-term potential of “reasoning.”

buff.ly

January 28, 2025 at 9:04 PM

Reposted by Madison May

Simon Willison

@simonwillison.net

OpenAI's Canvas feature got a big upgrade today, turning it into a direct competitor for Anthropic's excellent Claude Artifacts feature - my notes here: simonwillison.net/2025/Jan/25/...

OpenAI Canvas gets a huge upgrade

[Canvas](https://openai.com/index/introducing-canvas/) is the ChatGPT feature where ChatGPT can open up a shared editing environment and collaborate with the user on creating a document or piece of co...

simonwillison.net

January 25, 2025 at 1:26 AM

Madison May

@pragmaticml.bsky.social

If you don't notice the difference between GPT-4o and o1-pro, you're probably not asking specific enough questions

January 25, 2025 at 7:41 PM

Reposted by Madison May

Jeff Dean

@jeffdean.bsky.social

I am deeply worried by the withdrawal of the US from the World Health Organization. I worked at WHO for ~2 years at WHO's Global Programme on AIDS, a worldwide response to the HIV pandemic & international cooperation was critical. The US should not withdraw from WHO's global health cooperation.

Mark Joseph Stern @mjsdc.bsky.social · Jan 21

Trump is withdrawing the U.S. from the World Health Organization. Here’s the order: www.whitehouse.gov/presidential...

WITHDRAWING THE UNITED STATES FROM THE WORLDHEALTH ORGANIZATION – The White House

WITHDRAWING THE UNITED STATES FROM THE WORLD HEALTH ORGANIZATION By the authority vested in me as President by the Constitution and the laws of the

www.whitehouse.gov

January 21, 2025 at 3:55 AM

Reposted by Madison May

Andreas Madsen

@andreasmadsen.bsky.social

I’m thrilled to share that I’ve finished my Ph.D. at Mila and Polytechnique Montreal. For the last 4.5 years, I have worked on creating new faithfulness-centric paradigms for NLP Interpretability. Read my vision for the future of interpretability in our new position paper: arxiv.org/abs/2405.05386