Wayne et al have been running a live eval platform Copilot Arena - a VSCode extension serving code completions from AI systems to real developers. See 🧵 for findings and preprint
Excited to be evaluating human-AI *workflows* holistically!
In October, we launched Copilot Arena to collect user preferences on real dev workflows. After months of live service, we’re here to share our findings in our recent preprint.
Here's what we have learned /🧵
Wayne et al have been running a live eval platform Copilot Arena - a VSCode extension serving code completions from AI systems to real developers. See 🧵 for findings and preprint
Excited to be evaluating human-AI *workflows* holistically!
In October, we launched Copilot Arena to collect user preferences on real dev workflows. After months of live service, we’re here to share our findings in our recent preprint.
Here's what we have learned /🧵
In October, we launched Copilot Arena to collect user preferences on real dev workflows. After months of live service, we’re here to share our findings in our recent preprint.
Here's what we have learned /🧵
Mercury Coder Mini is blazing fast and overtakes Codestral as the fastest coding model out there (0.24s end-to-end latency) while boasting similar performance (joint #2).
Congrats to InceptionAILabs! 📸
Mercury Coder Mini is blazing fast and overtakes Codestral as the fastest coding model out there (0.24s end-to-end latency) while boasting similar performance (joint #2).
Congrats to InceptionAILabs! 📸
We've collected over 15k votes on 11 models (2 new models since our last blogpost release). Congrats @deepseek.bsky.social🥇and @anthropic.com🥇!
We've collected over 15k votes on 11 models (2 new models since our last blogpost release). Congrats @deepseek.bsky.social🥇and @anthropic.com🥇!
@naveenraman.bsky.social will be presenting in my stead.
In this work, we found that UI element ordering significantly affected GUI agent performance. Come check out the poster (and quiz Naveen) at the Workshop on Open-World Agents (OWA-2024)!
@naveenraman.bsky.social will be presenting in my stead.
In this work, we found that UI element ordering significantly affected GUI agent performance. Come check out the poster (and quiz Naveen) at the Workshop on Open-World Agents (OWA-2024)!
Check out how we handle code completions and share your ideas for new system prompts!
Github:
github.com/lmarena/copi...
Technical details in the blog: blog.lmarena.ai/blog/2024/co...
Download Copilot now at: lmarena.ai/copilot
Check out how we handle code completions and share your ideas for new system prompts!
Github:
github.com/lmarena/copi...
Technical details in the blog: blog.lmarena.ai/blog/2024/co...
Download Copilot now at: lmarena.ai/copilot