James Moore
infoonthego.bsky.social
James Moore
@infoonthego.bsky.social
CTO Freyda, #ai #ml #aws
Reposted by James Moore
many benchmarks used to measure AI capabilities are, I think, contrived and lenient. here's a good real-life study, on whether AI can do your (US) tax returns; a domain with plentiful training data and documentation. the result: the best model only got 33% of returns correct arxiv.org/pdf/2507.16126
August 19, 2025 at 4:02 PM