tadamcz.com
📍London
Completely correct answer I hadn't considered at all, and that might have taken me hours (days?) to find.
Wrong hypotheses in my prompt didn't sidetrack AI
Completely correct answer I hadn't considered at all, and that might have taken me hours (days?) to find.
Wrong hypotheses in my prompt didn't sidetrack AI
Today's AI is smart enough to find the bug in the React slop it wrote 7 months ago.
Today's AI is smart enough to find the bug in the React slop it wrote 7 months ago.
Eyeballing the plot, the SOTA improvement seems to be slowing down, compared to the progress we saw between Sonnet 3.5 and Opus 4.
Eyeballing the plot, the SOTA improvement seems to be slowing down, compared to the progress we saw between Sonnet 3.5 and Opus 4.
pass@the-kitchen-sink
On a benchmark, count all problems that _any_ LLM/scaffold/system has ever solved at least once.
pass@the-kitchen-sink
On a benchmark, count all problems that _any_ LLM/scaffold/system has ever solved at least once.
Automate the unionized fuckers away. It Just Works.
Automate the unionized fuckers away. It Just Works.
Turns out, they were already doing it.
Turns out, they were already doing it.
looked out the window and saw the Google Street view car roll by; I ran out and caught it! waved to the driver, seemed like a chill guy.
(no pics, literally ran out the door without my phone)
looked out the window and saw the Google Street view car roll by; I ran out and caught it! waved to the driver, seemed like a chill guy.
(no pics, literally ran out the door without my phone)
Everyone uses psychologymaxxed pricing to squeeze max surplus from you. Pub just says: half as much pint, half price
Everyone uses psychologymaxxed pricing to squeeze max surplus from you. Pub just says: half as much pint, half price
I hope this is a Zurich-based DeepMind researcher artfully trolling American colleagues
I hope this is a Zurich-based DeepMind researcher artfully trolling American colleagues
wtf, Anthropic? Not cool.
This feels like a 2005 Adobe Flash Player update trying to sneak the Yahoo! Toolbar past your grandma
wtf, Anthropic? Not cool.
This feels like a 2005 Adobe Flash Player update trying to sneak the Yahoo! Toolbar past your grandma
What makes this possible is a registry of optimized Docker images for each issue in SWE-bench.
We are open-sourcing these Docker images: you can `docker pull` them
epoch.ai/blog/sweben...
What makes this possible is a registry of optimized Docker images for each issue in SWE-bench.
We are open-sourcing these Docker images: you can `docker pull` them
epoch.ai/blog/sweben...
> Affordable therapies
OK, nice
> Thanks to funding from local grants and partner charities, we offer discounted appointments — the older you are, the less you pay
But of course! I briefly forgot I live in Gerontocracy Britain
> Affordable therapies
OK, nice
> Thanks to funding from local grants and partner charities, we offer discounted appointments — the older you are, the less you pay
But of course! I briefly forgot I live in Gerontocracy Britain
intentions.page makes you start each day by asking “what matters now?”
It's the only tool that's ever stuck for me long term; turns out I had to build my own!
intentions.page makes you start each day by asking “what matters now?”
It's the only tool that's ever stuck for me long term; turns out I had to build my own!