That doesn't mean these systems do everything well, but they do a lot well
That doesn't mean these systems do everything well, but they do a lot well
Hundreds of thousands of ordinary tenants will be dragged into an annual stamp duty calculation and filing regime.
Hundreds of thousands of ordinary tenants will be dragged into an annual stamp duty calculation and filing regime.
(And yes, they are both obviously interested in seeing their own products used, but hearing enough from other, independent coders that make me believe them. I wrote more about the shift here: www.oneusefulthing.org/p/management...)
(And yes, they are both obviously interested in seeing their own products used, but hearing enough from other, independent coders that make me believe them. I wrote more about the shift here: www.oneusefulthing.org/p/management...)
www.joxleywrites.jmoxley.co.uk/p/airport-bo...
www.joxleywrites.jmoxley.co.uk/p/airport-bo...
I've written a summary of a new paper by Google & Eedi. Their LLM tutor dramatically reduced hallucinations.
In total it made 5 errors out of 3,617 messages. Would a human teacher make fewer?
substack.nomoremarking.com/p/maybe-llm-...
I've written a summary of a new paper by Google & Eedi. Their LLM tutor dramatically reduced hallucinations.
In total it made 5 errors out of 3,617 messages. Would a human teacher make fewer?
substack.nomoremarking.com/p/maybe-llm-...
What this means: 🧵
What this means: 🧵
But the underlying data cannot bear the weight of these conclusions.
Latest on our Substack.
substack.nomoremarking.com/p/blooms-fam...
But the underlying data cannot bear the weight of these conclusions.
Latest on our Substack.
substack.nomoremarking.com/p/blooms-fam...
yimbyalliance.org/2025/12/18/h...
yimbyalliance.org/2025/12/18/h...
What does the British government spend its budget on? The chart shows spending broken down by category, scaled to £100. It combines both central and local government spending.
What does the British government spend its budget on? The chart shows spending broken down by category, scaled to £100. It combines both central and local government spending.
Tracking the occurrence of natural disasters can save lives by helping countries prepare for future ones.
In our work on natural disasters, we visualize data from EM-DAT, the most comprehensive disaster database.
Tracking the occurrence of natural disasters can save lives by helping countries prepare for future ones.
In our work on natural disasters, we visualize data from EM-DAT, the most comprehensive disaster database.
GDPval is probably the most economically relevant measure of AI ability, suggesting that in head-to-head competition with human experts on tasks that require 4-8 hours for a human to do, GPT-5.2 wins 71% of the time as judged by other humans.
GDPval is probably the most economically relevant measure of AI ability, suggesting that in head-to-head competition with human experts on tasks that require 4-8 hours for a human to do, GPT-5.2 wins 71% of the time as judged by other humans.
Everything is Prediction.
cassi-ai.com/news/cassi-c...
Everything is Prediction.
cassi-ai.com/news/cassi-c...
on.ft.com/42JHpyB
on.ft.com/42JHpyB
policyskeptic.blogspot.com/2025/10/ai-i...
policyskeptic.blogspot.com/2025/10/ai-i...
The scientific process is already breaking under a flood of human-created knowledge. How do we incorporate AI usefully?
The scientific process is already breaking under a flood of human-created knowledge. How do we incorporate AI usefully?
The bigger the model, the better it does at these out-of-distribution tasks
The bigger the model, the better it does at these out-of-distribution tasks
First impressions will shape the future of human-AI interaction—for better or worse. Accepted at #CSCW2025. See you in Norway! dl.acm.org/doi/10.1145/...
First impressions will shape the future of human-AI interaction—for better or worse. Accepted at #CSCW2025. See you in Norway! dl.acm.org/doi/10.1145/...
Industry experts outlined important, real-world, hard tasks for AI to do. Other experts were asked to do the tasks themselves (avg time: 7 hours) & yet others graded human & AI output
Models approached parity with humans & AI is getting better fast.
Industry experts outlined important, real-world, hard tasks for AI to do. Other experts were asked to do the tasks themselves (avg time: 7 hours) & yet others graded human & AI output
Models approached parity with humans & AI is getting better fast.
ManticAI ranked eighth in the Metaculus Cup, leaving some believing bots’ prediction skills could soon overtake experts
#ai #forecasting
www.theguardian.com/technology/2...
ManticAI ranked eighth in the Metaculus Cup, leaving some believing bots’ prediction skills could soon overtake experts
#ai #forecasting
www.theguardian.com/technology/2...