That being said, it does feel like there isn't much subject-specific academic discussion going on here, unlike old Twitter.
That being said, it does feel like there isn't much subject-specific academic discussion going on here, unlike old Twitter.
Maybe Titans and LLaDA are on the horizon!
Maybe Titans and LLaDA are on the horizon!
Gemini might just be the best publicly available general-purpose model right now!
Gemini might just be the best publicly available general-purpose model right now!
With style control, this lead reduces to +19 points, but the leap over previous Gemini remains at +46 points. Clearly, Google cooked something up with this release! (4)
With style control, this lead reduces to +19 points, but the leap over previous Gemini remains at +46 points. Clearly, Google cooked something up with this release! (4)
It looks like an incremental upgrade in the endless optimization war between Google and OpenAI over the last few months- until you start filtering.(2)
It looks like an incremental upgrade in the endless optimization war between Google and OpenAI over the last few months- until you start filtering.(2)
Also it will only be rolled out next year
Also it will only be rolled out next year
- Approaching benchmark limits: what comes next?
- Rankings and style control at LMArena: the 'hidden details' of the leaderboard
- o1's (not so) groundbreaking performance...
- Gemini's new is probably cooler than you think!
- Approaching benchmark limits: what comes next?
- Rankings and style control at LMArena: the 'hidden details' of the leaderboard
- o1's (not so) groundbreaking performance...
- Gemini's new is probably cooler than you think!
But I will also cover many other benchmarks to showcase their strengths and weaknesses at furthering our understanding of an LLM's performance.
But I will also cover many other benchmarks to showcase their strengths and weaknesses at furthering our understanding of an LLM's performance.