Lightnews — Scholar-powered news

Aaron Meurer

@asmeurer.com

210 followers 190 following 370 posts

Posting about AI, Python, SymPy, and other nonsense.

asmeurer.com

Posts Replies Media Videos

Aaron Meurer

@asmeurer.com

Now can we have subproblem breakouts like this for literally every other eval?

September 4, 2025 at 12:38 AM

Aaron Meurer

@asmeurer.com

Grok W

August 24, 2025 at 1:21 AM

Aaron Meurer

@asmeurer.com

(I just want to add, for Bluesky, that this stuff has been around for almost 2 years now and it's honestly starting to get embarrassing if you haven't tried them enough to actually figure this out by now)

August 8, 2025 at 7:06 PM

Aaron Meurer

@asmeurer.com

So while it's easy to "own" an AI by getting it to give a stupid answer to a simple question, you shouldn't let this fool you about its capabilities for things that you'd actually use them for.

August 8, 2025 at 7:06 PM

Aaron Meurer

@asmeurer.com

But LLMs are a very different kind of intelligence. They can be very smart at one thing and very dumb at another. (LLMs actually do also have their own intelligence correlations, but these are not really obvious even to people who use them a lot).

August 8, 2025 at 7:06 PM

Aaron Meurer

@asmeurer.com

That's why we ask stupid questions in interviews like whiteboard coding puzzles or "what's your biggest weakness?". Those things don't actually directly matter for the job, but they correlate enough that we can infer things from them.

August 8, 2025 at 7:06 PM

Aaron Meurer

@asmeurer.com

The reason for this is a bit unintuitive. We're used to being able to use proxy questions to judge intelligence because for humans, certain tasks correlate with each other and we have a good intuition for this.

August 8, 2025 at 7:06 PM

Aaron Meurer

@asmeurer.com

Why do you need to know how many b's there are in "blueberry"?

August 8, 2025 at 6:48 PM

Aaron Meurer

@asmeurer.com

LLMs do not use tesseract. They read the text off the image directly.

August 6, 2025 at 6:39 PM

Aaron Meurer

@asmeurer.com

Context poisoning from looping to fix errors is something that needs to be fixed in the agent runners. When using chat apps I fix this by starting new chats or editing previous queries to keep the context down. Agents need to do something similar where they delete the "fixup" loop from the context.

August 4, 2025 at 6:44 PM

Aaron Meurer

@asmeurer.com

I'm not sure that it actually helps that Bluesky hides the posts you've blocked. Makes posts like these seem like you're being hyperbolic unless you actually go and look up blocked posts on skythread.

July 26, 2025 at 5:26 AM

Aaron Meurer

@asmeurer.com

Regarding your last paragraph, what do you think of Meta's AI app? x.com/venturetwins...

x.com

July 23, 2025 at 11:35 PM

Aaron Meurer

@asmeurer.com

Still can't disable reposts from specific accounts though, so I'll still not be following a lot of you unfortunately.

July 8, 2025 at 4:35 PM

Aaron Meurer

@asmeurer.com

The one at home is the clone. The one still at teleporter is the original.

June 30, 2025 at 6:55 AM

Aaron Meurer

@asmeurer.com

Whatever happened to the $100B in profits definition of AGI?

June 27, 2025 at 8:54 PM

Aaron Meurer

@asmeurer.com

This post doesn't even answer the question though

June 23, 2025 at 3:12 AM

Aaron Meurer

@asmeurer.com

Most people who fly on planes survive. Let's hear from one of the people who died.

June 14, 2025 at 2:53 AM

Aaron Meurer

@asmeurer.com

It can understand English and follow arbitrary instructions.

June 14, 2025 at 2:46 AM

Aaron Meurer

@asmeurer.com

It didn't work for me

June 9, 2025 at 10:56 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news