It's impressive, but rushed.
I ran it against other SOTA models on 6 competitive programming problems of varying difficulties.
Here are the results!
It's impressive, but rushed.
I ran it against other SOTA models on 6 competitive programming problems of varying difficulties.
Here are the results!
Almost o1 level actually.
Today I sat down and ran a couple of competitive programming problems of varying difficulty on leading LLMs, like o1, 4o, Sonnet 3.6 and DeepSeek R1.
These are the preliminary results on 6 problems!
Almost o1 level actually.
Today I sat down and ran a couple of competitive programming problems of varying difficulty on leading LLMs, like o1, 4o, Sonnet 3.6 and DeepSeek R1.
These are the preliminary results on 6 problems!