These Tiny Recursive Models (TRMs), at only 7M parameters, hit 45% on ARC-AGI-1, beating DeepSeek R1, o3-mini, and Gemini 2.5 Pro with <0.01% of their size.
These Tiny Recursive Models (TRMs), at only 7M parameters, hit 45% on ARC-AGI-1, beating DeepSeek R1, o3-mini, and Gemini 2.5 Pro with <0.01% of their size.
But the current system keeps out journalists, science communicators, policy researchers, and fact checkers from reading into a topic as well.
But the current system keeps out journalists, science communicators, policy researchers, and fact checkers from reading into a topic as well.
We built SPOT, a dataset of STEM manuscripts across 10 fields annotated with real errors to find out.
(tl;dr not even close to usable) #NLProc
arxiv.org/abs/2505.11855
We built SPOT, a dataset of STEM manuscripts across 10 fields annotated with real errors to find out.
(tl;dr not even close to usable) #NLProc
arxiv.org/abs/2505.11855
Internal deployments of frontier AI models are an underexplored source of risk. My program at @csetgeorgetown.bsky.social just opened a call for research ideas—EOIs due Jun 30.
Full details ➡️ cset.georgetown.edu/wp-content/u...
Summary ⬇️
Internal deployments of frontier AI models are an underexplored source of risk. My program at @csetgeorgetown.bsky.social just opened a call for research ideas—EOIs due Jun 30.
Full details ➡️ cset.georgetown.edu/wp-content/u...
Summary ⬇️
At a time when information is being rewritten or erased online, a $700 million lawsuit from major record labels threatens to destroy the Wayback Machine.
Tell the labels to drop the 78s lawsuit.
👉 Sign our open letter: www.change.org/p/defend-the...
🧵⬇️
At a time when information is being rewritten or erased online, a $700 million lawsuit from major record labels threatens to destroy the Wayback Machine.
Tell the labels to drop the 78s lawsuit.
👉 Sign our open letter: www.change.org/p/defend-the...
🧵⬇️
simonwillison.net/2025/Apr/30/...
simonwillison.net/2025/Apr/30/...
andymasley.substack.com/p/a-cheat-sh...
andymasley.substack.com/p/a-cheat-sh...
Great thread from Sarah, and I have additional thoughts. 🧵
Great thread from Sarah, and I have additional thoughts. 🧵
You can explore how much countries spend relative to the size of their economies, how this has changed over time, and how spending is split across governments' priorities like health, education, and more:
➡️ ourworldindata.org/government-s...
You can explore how much countries spend relative to the size of their economies, how this has changed over time, and how spending is split across governments' priorities like health, education, and more:
➡️ ourworldindata.org/government-s...
Hopefully useful for anyone trying to understand what "AI agents" are, what they could be, and how they're being hyped. And for anyone who could use a bit of levity in all the AAAAAAH!
www.buzzsprout.com/2126417/epis...
Thx to @whatulysses.bsky.social for production!
Hopefully useful for anyone trying to understand what "AI agents" are, what they could be, and how they're being hyped. And for anyone who could use a bit of levity in all the AAAAAAH!
So: What the heck is Meta measuring? What are they seeing? 9/
So: What the heck is Meta measuring? What are they seeing? 9/
This is a portion of the Deep Field Fornax, a focused look at the region of space in the Fornax constellation.
Based on the press release, I estimate that there are more than TWO MILLION galaxies in this image.
This is a portion of the Deep Field Fornax, a focused look at the region of space in the Fornax constellation.
Based on the press release, I estimate that there are more than TWO MILLION galaxies in this image.