Lightnews — Scholar-powered news

Kevin Schaul

@kevinschaul.bsky.social

Must-read story on Common Crawl — the scraped internet data behind many LLMs. They tell publishers they are making progress on takedown requests, but ... nope!

Glad we have journalists with tech chops like @alexreisner.bsky.social who can test their claims

www.theatlantic.com/technology/2...

Crawl’s attorney wrote: “I confirm that Common Crawl has initiated work to remove your members’ content from the data archive. Presently, approximately 50% of this content has been removed.” I spoke with other publishers who’d received similar messages from Common Crawl. One was told, after multiple follow-up emails, that removal was 50 percent, 70 percent, and then 80 percent complete.

By writing code to browse the petabytes of data, I was able to see that large quantities of articles from the Times, the DRA, and these other publishers are still present in Common Crawl’s archives. Furthermore, the files are stored in a system that logs the modification times of every file. The foundation adds a new “crawl” to its archive every few weeks, each containing 1 billion to 4 billion webpages, and it has been publishing these regular installments since 2013. None of the content files in Common Crawl’s archives appears to have been modified since 2016, suggesting that no content has been removed in at least nine years.

Yet the nonprofit appears to be concealing this from visitors to its website, where a search function, the only nontechnical tool for seeing what’s in Common Crawl’s archives, returns misleading results for certain domains. A search for nytimes.com in any crawl from 2013 through 2022 shows a “no captures” result, when in fact there are articles from NYTimes.com in most of these crawls. I also discovered more than 1,000 other domains that produce this incorrect “no captures” result for at least several of the crawls, and most of these domains belong to publishers, including the BBC, Reuters, The New Yorker, Wired, the Financial Times, The Washington Post, and, yes, The Atlantic.

November 4, 2025 at 2:51 PM

Kevin Schaul

@kevinschaul.bsky.social

Asked ChatGPT to check an Amazon link daily and let me know when the item was available for purchase. Every morning I got a message that the item was available. It wasn't. Pretty annoying that stuff like that still doesn't work.✖️

November 3, 2025 at 9:29 PM

Kevin Schaul

@kevinschaul.bsky.social

Asked ChatGPT to find a recent paper about a small AI model that did well on arc agi. Was pleasantly surprised that it found it immediately ✔️ https://arxiv.org/abs/2510.04871

Look up a new paper on a small ai model that did well on arc agi. It came out a week or two ago

November 3, 2025 at 9:29 PM

Kevin Schaul

@kevinschaul.bsky.social

Lovely image editing evals with before/after sliders https://genai-showdown.specr.net/image-editing

October 27, 2025 at 6:11 PM

Kevin Schaul

@kevinschaul.bsky.social

Tech giants keep touting a system they built to label AI-generated content. But it only works if everyone uses it.

So I checked. They're not using it.

🎁 wapo.st/4qokjaC

Few social media sites labeled fake videos as AI-generated
After uploading the same Sora 2 video to each site, The Post checked whether its Content Credentials data was preserved and whether any label indicated use of AI.

October 22, 2025 at 1:39 PM

Kevin Schaul

@kevinschaul.bsky.social

Just tried out Atlas (OpenAI's new browser). Asked it to find me some cheap ram.

3 mins later, it told me Microcenter's best price was $299. I checked manually (took 10s) and found one at $183. None are $299.

🥸

(Video sped up 5x)

October 21, 2025 at 7:16 PM

Kevin Schaul

@kevinschaul.bsky.social

New from me: Last year, the best open-weight AI models were made in the U.S. Now, they are all made in China.

More data and what it means -> 🎁 wapo.st/4nPUBud

Chart titled Chinese companies make the most popular free AI models

October 13, 2025 at 1:37 PM

Kevin Schaul

@kevinschaul.bsky.social

Made a bunch of Sora 2 videos. Mindblowing, terrifying stuff. Here's Sam Altman brewing a fresh pot of AI slop. Lmk if you need an invite code, I have a few left.

October 6, 2025 at 2:13 PM

Kevin Schaul

@kevinschaul.bsky.social

Tried the same task with Gemini for Chrome. Total failure -- made up a bunch of dates. Seemed like it didn’t have access to click through different links? Not sure what went wrong. ✖️

October 6, 2025 at 2:13 PM

Kevin Schaul

@kevinschaul.bsky.social

Needed to know when something was removed from a website. I asked ChatGPT “agent mode” to visit the url on wayback machine and figure it out. Five minutes later, it gave me the answer. Save me a ton of tedious clicking, and easily verifiable, too. ✔️

October 6, 2025 at 2:13 PM

Kevin Schaul

@kevinschaul.bsky.social

We're about to get flooded with deepfakes.

Here's an AI-generated clip of me "asking" Sam Altman what they train their systems on (made in 10 seconds with Sora 2)

October 1, 2025 at 5:16 PM

Kevin Schaul

@kevinschaul.bsky.social

Worth a read: OpenAI released an eval for real work tasks across a bunch of industries. They didn't release the individual results (lame), but you can replicate them from the prompts and files.

A usable nugget: If you're outputting pdfs, xlsx or pptx, use Claude.

https://openai.com/index/gdpval/

Chart showing Claude with highest winrate for non-text file extensions

September 29, 2025 at 4:15 PM

Kevin Schaul

@kevinschaul.bsky.social

My favorite nugget is when I tried to make SpongeBob but kept hitting the policy filter.

Thought for a bit, tried "robert the sponge" -- it worked.

AI-generated videos of a SpongeBob-ish character

September 19, 2025 at 12:48 PM

Kevin Schaul

@kevinschaul.bsky.social

New by me: OpenAI won’t say whose content trained its video tool. We found some clues.

Gift link: wapo.st/3KeqLR0

September 19, 2025 at 12:04 PM

Kevin Schaul

@kevinschaul.bsky.social

Google's blog post on launching Gemini in Chrome does not include the word "privacy" or "security." Am I missing something or are they not addressing the very real threat of prompt injections?

https://blog.google/products/chrome/new-ai-features-for-chrome/

Screenshot of agentic browsing assistant checking out with Instacart

September 18, 2025 at 7:08 PM

Kevin Schaul

@kevinschaul.bsky.social

Great analysis of how Grok's political bias has changed. NYT tested Grok on a political bias survey, using different versions of its system prompt. Shows much tweaking these system prompts affects model outputs. https://www.nytimes.com/2025/09/02/technology/elon-musk-grok-conservative-chatbot.html

Screenshow of a chart showing how xAI tweaked Grok

September 2, 2025 at 2:18 PM

Kevin Schaul

@kevinschaul.bsky.social

What a fun LLM eval, draw a world map pixel by pixel with this prompt: "If this location is over land, say 'Land'. If this location is over water, say 'Water'. Do not say anything else. x° S, y° W" Somehow, it kinda works? https://outsidetext.substack.com/p/how-does-a-blind-model-see-the-earth

Plot of the results, showing a fairly recognizable world map

August 20, 2025 at 3:46 PM

Kevin Schaul

@kevinschaul.bsky.social

The political fundraising email eval is officially too easy https://kschaul.com/llm-evals/evals/political-fundraising-emails/

Screenshot of eval results showing most models getting 97%+ correct

August 12, 2025 at 6:37 PM

Kevin Schaul

@kevinschaul.bsky.social

I hooked Claude up to Adobe Illustrator (using MCP), pasted in a message from a copy editor and asked it to make the changes. It worked. My mind is blown.

More details and how to get set up here: kschaul.com/post/2025/05...

May 19, 2025 at 9:18 PM

Kevin Schaul

@kevinschaul.bsky.social

The diffs are pretty interesting:

- gpt-4o invented an entry for an Earthquake/Tsunami in Japan!
- Lot of confusion over whether an X means true or false, even seemingly changing tactics midway through the document

Please be studying this kind of thing if you are using LLMs for similar tasks!

${ - state_or_tribe_or_territory: "HI" + state_or_tribe_or_territory: "HI - Flooding" - requested: "2025-03-27" + requested: "2025-03-28" } { - state_or_tribe_or_territory: "IA" + state_or_tribe_or_territory: "IN - Severe Winter Storm" - PA: true + PA: false - HM: true + HM: false - requested: "2025-04-15" + requested: "2025-04-13" } + { + state_or_tribe_or_territory: "Japan - Earthquake/Tsunami" + incident_description: "Earthquake/Tsunami" + incident_type: "DR" + IA: false + PA: true + HM: true + requested: "2025-04-19" + } + { + state_or_tribe_or_territory: "KS - Severe Winter Storm, Straight-line Winds, Flooding, and Wildfire" + incident_description: "Severe Winter Storm, Straight-line Winds, Flooding, and Wildfire" + incident_type: "DR" + IA: true + PA: true + HM: true + requested: "2025-04-17" + }$

May 16, 2025 at 7:10 PM

Kevin Schaul

@kevinschaul.bsky.social

How reliable are LLMs at extracting data from pdfs? Inspired by @simonwillison.net's PyCon talk, I added extracting FEMA's daily operation briefing to my LLM evals suite.

Just one model extracted the data from the pdf correctly: Gemini 2.5 Pro Preview. Full results -> kschaul.com/llm-evals/ev...

Screenshot of the Declaration Requests in Process table

May 16, 2025 at 7:10 PM

Kevin Schaul

@kevinschaul.bsky.social

Got v1 of my llm evals dashboard set up. Check it out: kschaul.com/llm-evals/ev...

Screenshot of a website showing how well different LLM models performed on a task about whether an article is describing a new action/policy by the Trump administration. gemini-1.5-flash-latest leads

April 17, 2025 at 8:00 PM

Kevin Schaul

@kevinschaul.bsky.social

Judge Boasberg finds probable cause for contempt of court (!) in flight deportations case, citing in part a Washington Post graphic I made outlining the timing of flights and his orders. Very cool to see this impact. storage.courtlistener.com/recap/gov.us...

It was. Although the Government has refused to provide the particular details, all evidence suggests that during the short window that the Court was adjourned, two removal flights took off from Harlingen — one around 5:25 p.m. and the other at about 5:45 p.m. See ECF No. 21 (Resp. to Mar. 16 Notice) at 3–4 (relying on flight-tracking data for GlobalX Flights 6143, 6145, and 6122); see also Marianne LeVine et al., White House Official Says 137 Immigrants Deported Under Alien Enemies Act, Wash. Post (Mar. 16, 2025), https://perma.cc/U3NY-V3AS (comparing flight-tracking data with planes visible in three-minute video posted online by President of El Salvador and reposted by President Trump and Secretary of State Rubio); Joyce Sohyun Lee & Kevin Schaul, Deportation Flights Landed After Judge Said Planes Should Turn Around, Wash. Post (Mar. 16, 2025), https://perma.cc/QT6J-3SEQ (same); Nayib Bukele (@nayibbukele), X (Mar. 16, 2025, 8:13 a.m. EDT), https://perma.cc/XLE4-DDRW.

April 16, 2025 at 4:26 PM

Kevin Schaul

@kevinschaul.bsky.social

TIMELINE: Deportation flights landed after judge said planes should turn around. With @joyceshlee.bsky.social

Gift link with full graphic --> wapo.st/4iz0wB7

Alt text: Timeline showing events related to the Alien Enemies Act on a Saturday (Eastern time). The timeline tracks White House proclamation at 4:20 p.m., hearing beginning at 5:00 p.m., pausing at 5:20 p.m., and resuming at 6:00 p.m. At 6:47 p.m., a judge orders flights to return to the US, with the order formally entered at 7:26 p.m. Three flights are tracked: First flight (N278GX) departing Texas at 5:26 p.m. and arriving in Honduras at 7:37 p.m.; Second flight (N837VA) departing Texas at 5:45 p.m. and arriving in Honduras at 8:10 p.m.; Third flight (N630VA) departing Texas at 7:36 p.m. and arriving in Honduras at 9:50 p.m.

March 17, 2025 at 2:41 AM

Kevin Schaul

@kevinschaul.bsky.social

See how different stories have risen and fallen throughout the term so far.

Visualization showing storylines rise and fall over the weeks

February 24, 2025 at 4:06 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news