m-baldwin.bsky.social
@m-baldwin.bsky.social
Public defender.
Reposted
many benchmarks used to measure AI capabilities are, I think, contrived and lenient. here's a good real-life study, on whether AI can do your (US) tax returns; a domain with plentiful training data and documentation. the result: the best model only got 33% of returns correct arxiv.org/pdf/2507.16126
August 19, 2025 at 4:02 PM
Reposted
"Frankenstein’s monster copy-paste jobs from a bunch of different places? That’s not a summary. That’s word salad." buttondown.com/surekhadavie...
Basement adventures showed me why ChatGPT can only ever be garbage.
In The British Library. Photo by Surekha Davies. Hallo readers, First, a news flash: Join me for a virtual book launch for HUMANS: A MONSTROUS HISTORY...
buttondown.com
August 8, 2025 at 5:41 PM
Reposted
"We do original, fact-based, non-partisan journalism at a time when those three things do not translate to eyeballs. What does bring views is lazy and parasitic quick takes that leech off the time and expenses of real reporting."

www.courtwatch.news/p/shaming-th...
Shaming the Internet
We’re not mad, just disappointed. But also, mad.
www.courtwatch.news
June 24, 2025 at 5:34 PM