Lightnews — Scholar-powered news

Reposted by Greg

Tim Kellogg

@timkellogg.me

Anthropic acquires Bun (JS dependency management)

This should be seen as Anthropic doubling down on Claude Code.

They recently launched the native installer for CC through a tight partnership with Bun. You should expect to see more

www.anthropic.com/news/anthrop...

Anthropic acquires Bun as Claude Code reaches $1B milestone

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

www.anthropic.com

December 2, 2025 at 9:54 PM

Reposted by Greg

Simon Willison

@simonwillison.net

Out of curiosity I decided to try and run the numbers on how much Netflix you can watch for the energy cost of a ChatGPT prompt

As far as I can tell it's between 5.1 and 10.2 seconds, depending on which end of the 2019 IEA Netflix energy usage estimate you use

simonwillison.net/2025/Nov/29/...

In June 2025 Sam Altman claimed about ChatGPT that "the average query uses about 0.34 watt-hours".

In March 2020 George Kamiya of the International Energy Agency estimated that "streaming a Netflix video in 2019 typically consumed 0.12-0.24kWh of electricity per hour" - that's 240 watt-hours per hour at the higher end.

Assuming that higher end, a ChatGPT prompt by Sam Altman's estimate uses:

0.34 Wh / (240 Wh / 3600 seconds) = 5.1 seconds of Netflix

Or double that, 10.2 seconds, if you take the lower end of the Netflix estimate instead.

I'm always interested in anything that can help contextualize a number like "0.34 watt-hours" - I think this comparison to Netflix is a neat way of doing that.

This is evidently not the whole story with regards to AI energy usage - training costs, data center buildout costs and the ongoing fierce competition between the providers all add up to a very significant carbon footprint for the AI industry as a whole.

November 29, 2025 at 2:16 AM

Reposted by Greg

Simon Willison

@simonwillison.net

At the risk of starting the flame war to end all flame wars...

Modern LLMs (GPT-5.1, Claude 4.5, Gemini 3) produce excellent code and can be a significant productivity boost to software engineers who take the time to learn how to effectively apply them - especially if used with coding agent tools

November 27, 2025 at 7:55 PM

Reposted by Greg

Tim Kellogg

@timkellogg.me

too many people seem to be convinced that LLM vendors set prices on a cost plus basis

no, the advantage of closed weights is you can explore prices completely detached from cost. You’re free to set prices based purely on what people will pay, the value they get from it

November 25, 2025 at 3:29 PM

Reposted by Greg

Simon Willison

@simonwillison.net

Nano Banana Pro, released this morning, is clearly the best image generation model. Superb instruction following, plus it can generate full infographics (with correct spelling and properly rendered text!) from a short prompt based on running extra searches simonwillison.net/2025/Nov/20/...

Nano Banana Pro aka gemini-3-pro-image-preview is the best available image generation model

Hot on the heels of Tuesday’s Gemini 3 Pro release, today it’s Nano Banana Pro, also known as Gemini 3 Pro Image. I’ve had a few days of preview access …

simonwillison.net

November 20, 2025 at 4:34 PM

Greg

@theaspiringnerd.com

This is sick!

Ethan Mollick @emollick.bsky.social · 13d

"Hey, Gemini 3, So I need DOOM, but more root vegetables, also no guns or demons or mars. And more of a focus on different flooring styles. but otherwise EXACTLY the same as DOOM."

Gemini: "Here is F.L.O.O.R. (First-person Lino Observation & Ornamental Review)."

Pretty good!

November 19, 2025 at 11:12 PM

Reposted by Greg

Maria Antoniak

@mariaa.bsky.social

Some interesting stuff here on measuring writing quality and improving on qualitative tasks:
www.dbreunig.com/2025/07/31/h...

November 10, 2025 at 3:11 AM

Reposted by Greg

Tim Kellogg

@timkellogg.me

MCP Colors

A riff off of the lethal trifecta for addressing prompt injection, this is a simple heuristic to ensure security at runtime

red = untrusted content
blue = potentially critical actions

An agent can't be allowed to do both

timkellogg.me/blog/2025/11...

MCP Colors: Systematically deal with prompt injection risk

timkellogg.me

November 4, 2025 at 2:27 AM

Greg

@theaspiringnerd.com

Simon Willison @simonwillison.net · Oct 27

This was a tough but necessary decision - I posted my own notes on this here, from the perspective of a current PSF board member simonwillison.net/2025/Oct/27/...

October 28, 2025 at 12:09 AM

Reposted by Greg

Tim Kellogg

@timkellogg.me

my take, after reading replies all day:

we’re still early. people aren’t spending much money on AI so it’s not a lucrative target yet

it’s also inconsistent, which is annoying to design attacks for, especially if the rewards are sparse

Tim Kellogg @timkellogg.me · Oct 26

it is strange — why haven’t there been more prompt injection attacks? it’s a huge gaping hole

Stewart Alsop - Host of Craz... & • 22h S It feels like indirect prompt injection should be destroying all of our systems right now but it seems like it hasn't done anything (or the hacker's plans are measured in centuries).
What is going on? Is prompt injection a nothing burger or have I not been reading enough @simonw?

Simon Willison
@simonw
X.com
I'm confused by this too!
The lack of genuine prompt injection attacks in the wild (as opposed to security researcher POCs, of which there are hundreds) is very surprising to me

October 26, 2025 at 9:19 PM

Reposted by Greg

Simon Willison

@simonwillison.net

It's neat how if you ask Claude Code questions about itself it can answer them, because it knows how to fetch a Markdown index of its own online documentation and then navigate to the right place

I wish more LLM tools would implement the same pattern! simonwillison.net/2025/Oct/24/...

claude_code_docs_map.md

Something I'm enjoying about Claude Code is that any time you ask it questions about itself it runs tool calls like these: In this case I'd asked it about its …

simonwillison.net

October 24, 2025 at 11:06 PM

Reposted by Greg

Ethan Mollick

@emollick.bsky.social

I wrote an updated guide on which AIs to use right now, & some tips on how to use them (and how to avoid falling into some common traps)

A lot has changed since I last wrote a guide like this in the spring, and AI has gotten much more useful as a result. open.substack.com/pub/oneusefu...

An Opinionated Guide to Using AI Right Now

What AI to use in late 2025

open.substack.com

October 19, 2025 at 6:48 PM

Reposted by Greg

Tim Kellogg

@timkellogg.me

correct

i’ve been saying this for a couple months. RL is driving towards specialization

my hunch is it’s temporary and something will shift again back towards generalization, but for now.. buckle up!

clem
@ClementDelangue

Am I wrong in sensing a paradigm shift in Al?
Feels like we're moving from a world obsessed with generalist LLM APls to one where more and more companies are training, optimizing, and running their own models built on open source (especially smaller, specialized ones)
Some validating signs just in the past few weeks:
- @karpathy released nanochat to train models in just a few lines of code
- @thinkymachines launched a fine-tuning product
- rising popularity of @vllm_project, @sgl_project, @PrimeIntellect, Loras, trl,...
- 1M new repos on HF in the past 90 days (including the first open-source LLMs from @OpenAI)
And now, @nvidia just announced DGX Spark, powerful enough for everyone to fine-tune their own models at home.

October 15, 2025 at 11:39 AM

Reposted by Greg

Ethan Mollick

@emollick.bsky.social

I think people are still unprepared for a world where you cannot trust any video content, despite years of warning.

Even when Google & OpenAI include watermarks, those can be easily removed, and open weights AI video models without guardrails are coming. www.404media.co/sora-2-water...

Sora 2 Watermark Removers Flood the Web

Bypassing Sora 2's rudimentary safety features is easy and experts worry it'll lead to a new era of scams and disinformation.

www.404media.co

October 8, 2025 at 7:18 PM

Greg

@theaspiringnerd.com

I think I have at least one conversation (or argument?) about this every couple of days!

Ethan Mollick @emollick.bsky.social · Oct 4

The obsession with AI for transformational use cases obscures the fact that there are a ton of small, but very positive and very meaningful, use cases across many fields.

In this case, AI note-taking significantly reduces burnout among doctors & increases their ability to focus on their patients.

Eric Topol @erictopol.bsky.social · Oct 2

A.I. generated clinic notes from ambient out-patient visits helps clinicians in many ways, across 6 health systems jamanetwork.com/journals/jam...

October 4, 2025 at 10:01 PM

Reposted by Greg

Tim Kellogg

@timkellogg.me

sheesh! AI bluesky has arrived

not just good content, there’s more and more original work, people from labs, and people with genuinely interesting perspectives

when i joined, it was so painful trying to find even traces

September 27, 2025 at 5:56 PM

Reposted by Greg

Nathan Lambert

@natolambert.bsky.social

Codex in the app is going to open the door to real vibe coding on the go — no computer required. I’m so excited for this to expand, the ceiling is so high and this is the worst the models will ever be.

September 25, 2025 at 5:42 PM

Reposted by Greg

Simon Willison

@simonwillison.net

The key detail people may miss: it looks like an AI company in the USA can train on an author's book by purchasing a used copy, cutting it up and scanning the pages - in which case the author gets no money at all!

September 6, 2025 at 6:19 AM

Greg

@theaspiringnerd.com

I feel seen 👀 Although it’s improved since I use Cursor and have it generate my commit messages!

Vicki @vickiboykis.com · Aug 30

nothing more freeing than pushing absolutely terrible nondescript commit messages to a private repo only you are in

August 30, 2025 at 10:36 PM

Reposted by Greg

Ethan Mollick

@emollick.bsky.social

This is illustrative of common AI issues:
1) When you get an instant AI answer, it is from a small model, which are weak models, especially at math.
2) Non-reasoning models, like the one powering AI overview, only “think” as they write, they make mistakes & then back justify them as they write more

August 24, 2025 at 1:33 PM

Reposted by Greg

Tim Kellogg

@timkellogg.me

As LLMs Improve, People Adapt Their Prompts

a study shows that a lot of the real world performance gains that people see are actually because people learn how to use the model better

arxiv.org/abs/2407.14333

The chart presents the decomposition of Average Treatment Effect (ATE) on cosine similarity into two components: Model Effect (red) and Prompting Effect (blue).
• Y-axis: Δ Cosine Similarity (change in similarity).
• X-axis: The source of prompts (top labels) and the replay model used (bottom labels).
• Points and error bars: Represent mean effects with 95% confidence intervals, bootstrapped and clustered by participant.

Breakdown:
1. DALL-E 2 → DALL-E 2 (baseline): Δ Cosine Similarity is ~0, establishing the reference point.
2. DALL-E 2 prompts replayed on DALL-E 3: Shows a Model Effect (increase ~0.007–0.008). This isolates the improvement attributable to the newer model when given the same prompts.
3. DALL-E 3 prompts replayed on DALL-E 3 vs DALL-E 2 prompts on DALL-E 3: The additional boost is attributed to the Prompting Effect (~0.006–0.007).
4. Total ATE (black bracket): When prompts written for DALL-E 3 are used on DALL-E 3, the improvement in cosine similarity reaches ~0.016–0.018.
5. DALL-E 3 prompts replayed on DALL-E 2: Effect is small, close to baseline, showing the limited benefit of improved prompts without the newer model.

Summary (from caption):
• ATE (black) = Model Effect (red) + Prompting Effect (blue).
• Model upgrades (DALL-E 3 vs DALL-E 2) and better prompt designs both contribute to improved performance.
• Prompting alone offers some gains, but most improvements come from model advancements.

August 23, 2025 at 4:40 PM

Reposted by Greg

antirez

@antirez.bsky.social

When you use coding agents for something that produces the bulk of the code of an application that will be used for years, also factor in the technical debt that you are happily accumulating. When you use LLMs an an aid, you could, on the contrary, improve your coding culture.

August 21, 2025 at 1:08 PM

Reposted by Greg

Ethan Mollick

@emollick.bsky.social

Looking at the ARC-AGI benchmark is a useful way of understanding AI progress.

There are two goals in AI, minimize cost (which is also roughly environmental impact of use) & maximize ability. It is clear you can win one goal by losing the other, GPT-5 seems to be a gain on both.

August 17, 2025 at 11:32 AM

Reposted by Greg

Simon Willison

@simonwillison.net

The new Gemma 3 270M open weights model from Google is really fun - it's absolutely tiny, just a 241MB download

I asked it for an SVG of a pelican riding a bicycle and it wrote me a delightful little poem instead

simonwillison.net/2025/Aug/14/...

Pelican Riding Bike

This is the cat!
He's got big wings and a happy tail.
He loves to ride his bike!

Bike lights are shining bright.
He's got a shiny top, too!
He's ready for adventure!

August 14, 2025 at 5:27 PM

Reposted by Greg

Tim Kellogg

@timkellogg.me

i think i agree. i also think it’ll be a moat against other labs that don’t a strong product

it doesn’t make sense if you don’t have a strong UI, plus it’s obviously hard

roon y @tszzl
model switcher paradigm will be vindicated in the long run. there is a high switching cost into a very new UX on a useful product, but it's the right move

August 10, 2025 at 12:02 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news