LMAnalysis
mlanalysis.bsky.social
LMAnalysis
@mlanalysis.bsky.social
WIP- project dedicated to help demystify benchmarking of LLMs.
The new Gemini release from Google has mostly flown under the radar- perhaps understandably so.
🔮
Regaining the #1 spot on the lmarena.ai overall leaderboard feels like Google just finetuned their model for human preference again- but taking a closer look reveals truly remarkable performance... 🧵
December 7, 2024 at 8:39 PM
Hello BlueSky🦋! This page will be all about benchmarks of large language models.

I've decided to create it for two key reasons:
Firstly, benchmarking LLMs is becoming more difficult.
And secondly, interpreting benchmarks can be difficult.
December 6, 2024 at 10:48 PM