Currently at Imbue. Previously Ambient.ai, Stripe, RethinkDB, Max Planck Institute.
This is huge.
Only caveat is that they evaluated on the public set - it might have been used in post training of Gemini 3? Looking forward to see private eval results! poetiq.ai/posts/arcagi...
This is huge.
Only caveat is that they evaluated on the public set - it might have been used in post training of Gemini 3? Looking forward to see private eval results! poetiq.ai/posts/arcagi...
...until I watched this video by AlphaPhoenix - it's such an amazing visualization of what's going on in a cable! youtu.be/RkAF3X6cJa4?...
...until I watched this video by AlphaPhoenix - it's such an amazing visualization of what's going on in a cable! youtu.be/RkAF3X6cJa4?...
However, in my actual use, I've found it to be a bit underwhelming compared to Sonnet 4 (not to mention Sonnet 4.5).
What has been your experience with Haiku?
However, in my actual use, I've found it to be a bit underwhelming compared to Sonnet 4 (not to mention Sonnet 4.5).
What has been your experience with Haiku?
I think almost nobody is talking about symbolic GOFAI these days, so I'm not concerned about that.
But all of ML being re-branded to AI lately, while AI has simultaneously been made synonymous with generative AI in many places, has led to so much confusion.
Policy addresses the wrong harms, companies have been confused about who should lead efforts, hiring is misguided, academic discussion is often muddled.
I think almost nobody is talking about symbolic GOFAI these days, so I'm not concerned about that.
But all of ML being re-branded to AI lately, while AI has simultaneously been made synonymous with generative AI in many places, has led to so much confusion.
Often to convert some data or create visualizations, but also one-off new features to add to some open-source application that I only want to use once.
Often to convert some data or create visualizations, but also one-off new features to add to some open-source application that I only want to use once.
link in reply!
link in reply!
It also comes with built-in verifiers to automatically check the quality of AI written code. More to come! imbue.com/sculptor-ann...
It also comes with built-in verifiers to automatically check the quality of AI written code. More to come! imbue.com/sculptor-ann...
It was a very quick and fun mini project that taught me a thing or two about Lisp. Use it at your own risk.
It was a very quick and fun mini project that taught me a thing or two about Lisp. Use it at your own risk.
I discussed what that means. www.oneusefulthing.org/p/on-working...
I discussed what that means. www.oneusefulthing.org/p/on-working...
This is not where the training data for AI comes from, it is a study done by a SEO firm that claims to show how often sites come up at least once in THE WEB SEARCH FUNCTION of certain AI agents when they do a web search for more info.
This is not all bad: we can start focusing on how to work around those limitations and how to put current LLMs to work.
This is not all bad: we can start focusing on how to work around those limitations and how to put current LLMs to work.
Excited to share our new paper, “Competition and Attraction Improve Model Fusion” presented at GECCO 2025 (runner-up for best paper)!
arxiv.org/abs/2508.16204
Excited to share our new paper, “Competition and Attraction Improve Model Fusion” presented at GECCO 2025 (runner-up for best paper)!
arxiv.org/abs/2508.16204
The trick? Measuring confidence in the reasoning traces as they are being generated, then ending low confidence traces early.
jiaweizzhao.github.io/deepconf/
The trick? Measuring confidence in the reasoning traces as they are being generated, then ending low confidence traces early.
jiaweizzhao.github.io/deepconf/
2.4 million are injured & 40k killed in US accidents a year
2.4 million are injured & 40k killed in US accidents a year