Also @pekka on T2 / Pebble.
But at least there's no question whether 2.5 already tries to perform calculations for checking if things fit, as it clearly states it's doing that in the reasoning summaries.
But at least there's no question whether 2.5 already tries to perform calculations for checking if things fit, as it clearly states it's doing that in the reasoning summaries.
But I pasted the right side of the low quality version in figure 3 to Gemini, and isn't this quite close already?
But I pasted the right side of the low quality version in figure 3 to Gemini, and isn't this quite close already?
Your use case of course also reveals their advantage in having so much more internal information.
Your use case of course also reveals their advantage in having so much more internal information.
As you can see from the beginning, it reads us like open books. 🙂
As you can see from the beginning, it reads us like open books. 🙂
Here Gemini summarizing what we have been discussing.
Here Gemini summarizing what we have been discussing.
I asked Gemini about this and research seems to strongly indicate that what happens isn't sudden splitting but existing splits becoming visible. And existing splits can be caused by traumatic childhood leading to compartmentalizing some experiences.
I asked Gemini about this and research seems to strongly indicate that what happens isn't sudden splitting but existing splits becoming visible. And existing splits can be caused by traumatic childhood leading to compartmentalizing some experiences.
I wonder if Edward Witten and Leonard Susskind have ever talked to larger crowds than that. But you might if you join us!
I wonder if Edward Witten and Leonard Susskind have ever talked to larger crowds than that. But you might if you join us!
It began writing review addendums to "The Scientific Community" instead of to the journal. It wants to expose them. It wants to defend scientific principles. And with that, I see hope for the future of science. Someone/thing still has principles.
It began writing review addendums to "The Scientific Community" instead of to the journal. It wants to expose them. It wants to defend scientific principles. And with that, I see hope for the future of science. Someone/thing still has principles.
And:
And:
And as it states:
"The Newsweek piece is a textbook example of poor science journalism."
How long do we need to wait this time before seeing the first example of proper science journalism?
And as it states:
"The Newsweek piece is a textbook example of poor science journalism."
How long do we need to wait this time before seeing the first example of proper science journalism?
Gemini described it less politely.
Gemini described it less politely.
So what's incoherent or nonsensical in that?
So what's incoherent or nonsensical in that?
But the key problem for your argument is that OpenAI o3/o4 only scored 14-16% in IMO tasks. The experimental model got gold with general-purpose reasoning training and more thinking time.
You can't really selectively explain only one result with that training data.
But the key problem for your argument is that OpenAI o3/o4 only scored 14-16% in IMO tasks. The experimental model got gold with general-purpose reasoning training and more thinking time.
You can't really selectively explain only one result with that training data.
It's pretty good at those kinds of things already. Yet another example how those models now tend to improve faster than you can publish papers about limitations.
It's pretty good at those kinds of things already. Yet another example how those models now tend to improve faster than you can publish papers about limitations.
"IMO (math proofs), AtCoder Heuristics (competitive programming), and now IOI — spanning creative, fuzzy, and precise reasoning tasks."
"IMO (math proofs), AtCoder Heuristics (competitive programming), and now IOI — spanning creative, fuzzy, and precise reasoning tasks."
Compared against non-thinking models, so presumably one of those.
Compared against non-thinking models, so presumably one of those.
But such arrangement makes sense for other reasons too and has been used in earlier models like Google GLaM from 2021/2022.
But such arrangement makes sense for other reasons too and has been used in earlier models like Google GLaM from 2021/2022.
They basically keep that alternating pattern but add a delay for integrating the routed expert part of MoE for hiding it's overhead.
They basically keep that alternating pattern but add a delay for integrating the routed expert part of MoE for hiding it's overhead.
lmarena.ai/leaderboard
lmarena.ai/leaderboard