🔮
Regaining the #1 spot on the lmarena.ai overall leaderboard feels like Google just finetuned their model for human preference again- but taking a closer look reveals truly remarkable performance... 🧵
🔮
Regaining the #1 spot on the lmarena.ai overall leaderboard feels like Google just finetuned their model for human preference again- but taking a closer look reveals truly remarkable performance... 🧵
I've decided to create it for two key reasons:
Firstly, benchmarking LLMs is becoming more difficult.
And secondly, interpreting benchmarks can be difficult.
I've decided to create it for two key reasons:
Firstly, benchmarking LLMs is becoming more difficult.
And secondly, interpreting benchmarks can be difficult.