Aaron Meurer
asmeurer.com
Aaron Meurer
@asmeurer.com
Posting about AI, Python, SymPy, and other nonsense.

asmeurer.com
Now can we have subproblem breakouts like this for literally every other eval?
September 4, 2025 at 12:38 AM
Grok W
August 24, 2025 at 1:21 AM
(I just want to add, for Bluesky, that this stuff has been around for almost 2 years now and it's honestly starting to get embarrassing if you haven't tried them enough to actually figure this out by now)
August 8, 2025 at 7:06 PM
So while it's easy to "own" an AI by getting it to give a stupid answer to a simple question, you shouldn't let this fool you about its capabilities for things that you'd actually use them for.
August 8, 2025 at 7:06 PM
But LLMs are a very different kind of intelligence. They can be very smart at one thing and very dumb at another. (LLMs actually do also have their own intelligence correlations, but these are not really obvious even to people who use them a lot).
August 8, 2025 at 7:06 PM
That's why we ask stupid questions in interviews like whiteboard coding puzzles or "what's your biggest weakness?". Those things don't actually directly matter for the job, but they correlate enough that we can infer things from them.
August 8, 2025 at 7:06 PM
The reason for this is a bit unintuitive. We're used to being able to use proxy questions to judge intelligence because for humans, certain tasks correlate with each other and we have a good intuition for this.
August 8, 2025 at 7:06 PM
Why do you need to know how many b's there are in "blueberry"?
August 8, 2025 at 6:48 PM
LLMs do not use tesseract. They read the text off the image directly.
August 6, 2025 at 6:39 PM
Context poisoning from looping to fix errors is something that needs to be fixed in the agent runners. When using chat apps I fix this by starting new chats or editing previous queries to keep the context down. Agents need to do something similar where they delete the "fixup" loop from the context.
August 4, 2025 at 6:44 PM
I'm not sure that it actually helps that Bluesky hides the posts you've blocked. Makes posts like these seem like you're being hyperbolic unless you actually go and look up blocked posts on skythread.
July 26, 2025 at 5:26 AM
Regarding your last paragraph, what do you think of Meta's AI app? x.com/venturetwins...
x.com
July 23, 2025 at 11:35 PM
Still can't disable reposts from specific accounts though, so I'll still not be following a lot of you unfortunately.
July 8, 2025 at 4:35 PM
The one at home is the clone. The one still at teleporter is the original.
June 30, 2025 at 6:55 AM
Whatever happened to the $100B in profits definition of AGI?
June 27, 2025 at 8:54 PM
This post doesn't even answer the question though
June 23, 2025 at 3:12 AM
Most people who fly on planes survive. Let's hear from one of the people who died.
June 14, 2025 at 2:53 AM
It can understand English and follow arbitrary instructions.
June 14, 2025 at 2:46 AM
It didn't work for me
June 9, 2025 at 10:56 PM