whirlwind013.bsky.social
@whirlwind013.bsky.social
Reposted
If you have used a frontier model (GPT-5.2 Thinking Extended, Claude 4.6 Opus, Gemini 3 Pro) to try to do real work for an hour or more & still felt that AI is just overhyped nonsense, please share, I want to understand why.

That doesn't mean these systems do everything well, but they do a lot well
February 11, 2026 at 11:02 PM
TBF, the linked article says none of this.
February 11, 2026 at 7:38 AM
Reposted
From the GPT 5.3 model card. "GPT-5.3-Codex is the most capable model we’ve ever deployed in the Cybersecurity domain. As discussed in more detail below, this is the first launch we are treating as High capability in Cybersecurity." cdn.openai.com/pdf/23eca107...
cdn.openai.com
February 6, 2026 at 12:55 PM
Reposted
Fun Bentham's Bulldog review of the very silly-sounding book The AI Con benthams.substack.com/p/the-ai-con... it really amazes me there are still people saying "AI is just a stochastic parrot and also it's racist" in 2026, as though they can't just log on and use one themselves right now
"The AI Con" Con
The shockingly terrible arguments of the AI naysayers
benthams.substack.com
February 4, 2026 at 4:18 PM
Reposted
New report: a serious unintended consequence of the Government’s tenancy reforms.

Hundreds of thousands of ordinary tenants will be dragged into an annual stamp duty calculation and filing regime.
January 30, 2026 at 9:12 AM
Reposted
They’re already complaining about the platform. AGI
January 30, 2026 at 2:03 AM
Reposted
Canaries in the coal mine. Worth paying attention to.

(And yes, they are both obviously interested in seeing their own products used, but hearing enough from other, independent coders that make me believe them. I wrote more about the shift here: www.oneusefulthing.org/p/management...)
January 28, 2026 at 8:56 PM
Reposted
Several people have said this by @joxley.jmoxley.co.uk is good. They're right.
www.joxleywrites.jmoxley.co.uk/p/airport-bo...
Airport Book Brain
How faddish ideas keep seducing.
www.joxleywrites.jmoxley.co.uk
January 23, 2026 at 10:38 AM
A really interesting look at the current state of play in regard to prediction markets, with some notes on AI prediction too www.astralcodexten.com/p/mantic-mon...
Mantic Monday: The Monkey's Paw Curls
...
www.astralcodexten.com
January 13, 2026 at 10:00 AM
Reposted
Latest on our Substack - the most promising LLM tutor so far!

I've written a summary of a new paper by Google & Eedi. Their LLM tutor dramatically reduced hallucinations.

In total it made 5 errors out of 3,617 messages. Would a human teacher make fewer?

substack.nomoremarking.com/p/maybe-llm-...
Maybe LLM tutors might be able to work...
The best study I have seen so far
substack.nomoremarking.com
January 11, 2026 at 9:32 PM
Reposted
This is huge—Cassi came 2nd overall and 1st on "Dataset" questions on @Research_FRI Forecastbench! 🥈📈

What this means: 🧵
January 8, 2026 at 10:21 PM
Reposted
Bloom's famous 1984 paper on the 2 sigma tutoring effect has justified huge spending on human tutors & massively influenced modern ed tech.

But the underlying data cannot bear the weight of these conclusions.

Latest on our Substack.

substack.nomoremarking.com/p/blooms-fam...
Bloom's famous 2 sigma tutoring paper is incredibly misleading
One-to-one tuition is not what it's cracked up to be
open.substack.com
January 3, 2026 at 10:45 AM
Reposted
As a Christmas present to renters, we’ve build a new public zoomable map of rents per square metres in England and Wales! Have fun browsing and please share:

yimbyalliance.org/2025/12/18/h...
How much space can you afford to rent? - YIMBY Alliance
A map of rent per square metre for renters in England and Wales If you rent, you probably don’t think in square metres. You think about how many bedrooms and space you can actually get for your monthl...
yimbyalliance.org
December 18, 2025 at 10:52 AM
Reposted
How the UK government spends £100 of its budget—

What does the British government spend its budget on? The chart shows spending broken down by category, scaled to £100. It combines both central and local government spending.
December 18, 2025 at 5:54 PM
Reposted
📊 Data update: We’ve updated our charts with the latest data on natural disasters.

Tracking the occurrence of natural disasters can save lives by helping countries prepare for future ones.

In our work on natural disasters, we visualize data from EM-DAT, the most comprehensive disaster database.
December 16, 2025 at 6:44 PM
Reposted
Whoa. This new GDPval score for GPT-5.2 is something.

GDPval is probably the most economically relevant measure of AI ability, suggesting that in head-to-head competition with human experts on tasks that require 4-8 hours for a human to do, GPT-5.2 wins 71% of the time as judged by other humans.
December 11, 2025 at 6:52 PM
Reposted
Cassi is your guide in an age of radical uncertainty.

Everything is Prediction.

cassi-ai.com/news/cassi-c...
Cassi closes strategic pre-seed round led by Twin Track Ventures | Cassi
Cassi, the “Superstrategy” engine that blends AI with collective intelligence to improve high stakes decision-making, has closed a £500,000 pre-seed funding round.
cassi-ai.com
December 2, 2025 at 7:33 AM
Reposted
Interesting experiment found that an AI agent built around the obsolete GPT-3.5 and GPT-4 models beat experienced human venture capital analysts in predicting which early-stage startups would survive based on early screening (at much lower costs as well). www.sciencedirect.com/science/arti...
December 1, 2025 at 7:20 PM
Reposted
We keep being told "there are no silver bullets". Nonsense. There are plenty of silver bullets. If they're not working as well as you hoped, perhaps you're not shooting at a werewolf.
on.ft.com/42JHpyB
The silver bullet fallacy
[FREE TO READ] The idea that a lot of problems are difficult to fully solve doesn’t mean we should stop trying
on.ft.com
October 16, 2025 at 2:06 PM
Reposted
Artificial Intelligence is neither the huge threat nor great hope many commentators seem to think.

policyskeptic.blogspot.com/2025/10/ai-i...
AI is nether the danger or the solution many think
The biggest danger of AI is not that it will become Skynet and destroy us all. It is that it will make us too lazy to exercise critical thin...
policyskeptic.blogspot.com
October 10, 2025 at 12:38 PM
Reposted
Very soon, the blocker to using AI to accelerate science is not going to be the ability of AI (expect to see this soon), but rather the systems of science, as creaky as they are.

The scientific process is already breaking under a flood of human-created knowledge. How do we incorporate AI usefully?
October 6, 2025 at 12:46 AM
Reposted
This seems like a pretty big finding on AI generalization: If you train an AI model on enough video, it seems to gain the ability to reason about images in ways it was never trained to do, including solving mazes & puzzles.

The bigger the model, the better it does at these out-of-distribution tasks
October 3, 2025 at 12:59 PM
Reposted
In our new paper, we discovered "The AI Double Standard": People judge all AIs for the harm done by one AI, more strongly than they judge humans.

First impressions will shape the future of human-AI interaction—for better or worse. Accepted at #CSCW2025. See you in Norway! dl.acm.org/doi/10.1145/...
September 29, 2025 at 3:29 PM
Reposted
After reading it, this does seem like a big deal

Industry experts outlined important, real-world, hard tasks for AI to do. Other experts were asked to do the tasks themselves (avg time: 7 hours) & yet others graded human & AI output

Models approached parity with humans & AI is getting better fast.
September 26, 2025 at 12:47 AM
Reposted
British AI startup beats humans in international forecasting competition

ManticAI ranked eighth in the Metaculus Cup, leaving some believing bots’ prediction skills could soon overtake experts
#ai #forecasting

www.theguardian.com/technology/2...
British AI startup beats humans in international forecasting competition
ManticAI ranked eighth in the Metaculus Cup, leaving some believing bots’ prediction skills could soon overtake experts
www.theguardian.com
September 20, 2025 at 2:04 PM