Lightnews — Scholar-powered news

LMAnalysis

@mlanalysis.bsky.social

52 followers 810 following 15 posts

WIP- project dedicated to help demystify benchmarking of LLMs.

Posts Replies Media Videos

LMAnalysis

@mlanalysis.bsky.social

The new Gemini release from Google has mostly flown under the radar- perhaps understandably so.
🔮
Regaining the #1 spot on the lmarena.ai overall leaderboard feels like Google just finetuned their model for human preference again- but taking a closer look reveals truly remarkable performance... 🧵

LMArena categories overview. Gemini-exp-1206 is first or tied for first accross all categories.

December 7, 2024 at 8:39 PM

LMAnalysis

@mlanalysis.bsky.social

Hello BlueSky🦋! This page will be all about benchmarks of large language models.

I've decided to create it for two key reasons:
Firstly, benchmarking LLMs is becoming more difficult.
And secondly, interpreting benchmarks can be difficult.

December 6, 2024 at 10:48 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news