Lightnews — Scholar-powered news

Alex Makelov

@amakelov.bsky.social

Mechanistic interpretability
Creator of https://github.com/amakelov/mandala
prev. Harvard/MIT
machine learning, theoretical computer science, competition math.

Posts Replies Media Videos

Alex Makelov

@amakelov.bsky.social

Finally, it realizes and tries to fix the off-by-a-factor-of-6 issue. It writes a little essay giving what mathematicians would call a "moral" argument for why everything is OK. Pretty close!

December 5, 2024 at 9:20 PM

Alex Makelov

@amakelov.bsky.social

Then, it counts these triples. Unfortunately, it counts the number of ordered triples, which overestimates the number of unordered triples (what we care about) by about a factor of 6. Then it proceeds to the key step - lower-bound the average number of representations:

December 5, 2024 at 9:19 PM

Alex Makelov

@amakelov.bsky.social

So how does o1 do? Well, still not perfect, but it gets the overall steps correct! It goes for a direct pigeonhole argument. It eventually figures out that if we look at triples of numbers at most 18,000 each, the sum of their squares is always less than 1,000,000,000:

December 5, 2024 at 9:18 PM

Alex Makelov

@amakelov.bsky.social

Similarly, o1-mini (and o1-preview, from what I remember - it's not available in chat anymore) recalls the asymptotic statement, and spends more time talking about it, but also proves nothing about the constant.

December 5, 2024 at 9:18 PM

Alex Makelov

@amakelov.bsky.social

So how do LLMs do on this problem? 4o spits out a bunch of related facts and confidently asserts the (correct) answer without justification. Importantly, it states that the number of representations grows as sqrt(n) asymptotically - which is true, but the constant is decisive.

December 5, 2024 at 9:17 PM

Alex Makelov

@amakelov.bsky.social

The problem superficially pattern-matches to some heavy-ish tools, like Pythagorean triples or Legendre's three-square theorem; however, the only solution I'm aware of is actually quite simple and uses no "theory".

December 5, 2024 at 9:16 PM

Alex Makelov

@amakelov.bsky.social

Some fun with o1 from OpenAI: there's a math problem I often give to "reasoning" AIs to try them out. It's basically to prove that there's a number less than 1 billion that you can write in 1000 different ways as a sum of 3 squares (precise statement in the pic).

December 5, 2024 at 9:15 PM

Alex Makelov

@amakelov.bsky.social

yes, this is what mechanistic interpretability research looks like

Cat sitting on a chair in front of a parked black car with its rear wheel removed and a hydraulic jack supporting it

November 24, 2024 at 7:51 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news