Wayne
banner
waynechi.bsky.social
Wayne
@waynechi.bsky.social
CS Ph.D. at CMU. Building Copilot Arena. Editor at http://blog.ml.cmu.edu
Reposted by Wayne
Inaugurating new acct to share work from my PhD student!

Wayne et al have been running a live eval platform Copilot Arena - a VSCode extension serving code completions from AI systems to real developers. See 🧵 for findings and preprint

Excited to be evaluating human-AI *workflows* holistically!
What do developers 𝘳𝘦𝘢𝘭𝘭𝘺 think of AI coding assistants?

In October, we launched Copilot Arena to collect user preferences on real dev workflows. After months of live service, we’re here to share our findings in our recent preprint.

Here's what we have learned /🧵
March 5, 2025 at 5:01 PM
What do developers 𝘳𝘦𝘢𝘭𝘭𝘺 think of AI coding assistants?

In October, we launched Copilot Arena to collect user preferences on real dev workflows. After months of live service, we’re here to share our findings in our recent preprint.

Here's what we have learned /🧵
March 5, 2025 at 4:49 PM
Got to test out InceptionAILab's newest model, Mercury Coder Mini, on Copilot Arena!

Mercury Coder Mini is blazing fast and overtakes Codestral as the fastest coding model out there (0.24s end-to-end latency) while boasting similar performance (joint #2).

Congrats to InceptionAILabs! 📸
February 26, 2025 at 11:51 PM
I had the same problem. I only use cursor for newer, small projects. I use Copilot Arena's edit feature for projects in VSCode (but obviously I'm biased)
tried switching to cursor and having extreme difficulty getting all my vscode extensions to work properly ☹️ doesn’t seem worth
January 5, 2025 at 8:11 AM
Deepseek v3 (FiM) is now available in Copilot Arena for free!

Download at lmarena.ai/copilot
December 31, 2024 at 9:12 PM
These lists are better than most "2024's best games" lists
This week's Famitsu had a lot of Japanese gaming industry folks give their personal Game of the Year lists. I'll update this thread periodically since there's a lot of them.
December 27, 2024 at 4:16 AM
Copilot Arena's leaderboard is now live on lmarena.ai/leaderboard!

We've collected over 15k votes on 11 models (2 new models since our last blogpost release). Congrats @deepseek.bsky.social🥇and @anthropic.com🥇!
Chatbot Arena (formerly LMSYS): Free AI Chat to Compare & Test Best AI Chatbots
lmarena.ai
December 23, 2024 at 9:41 PM
I'm not physically at NeurIPS, but my good friend
@naveenraman.bsky.social will be presenting in my stead.

In this work, we found that UI element ordering significantly affected GUI agent performance. Come check out the poster (and quiz Naveen) at the Workshop on Open-World Agents (OWA-2024)!
December 13, 2024 at 7:32 AM
Bruh what... 💀
December 10, 2024 at 5:51 PM
We've open sourced CopilotArena’s server code!

Check out how we handle code completions and share your ideas for new system prompts!

Github:
github.com/lmarena/copi...
Technical details in the blog: blog.lmarena.ai/blog/2024/co...

Download Copilot now at: lmarena.ai/copilot
December 5, 2024 at 7:44 PM
Trying out Bluesky. Will mostly be posting about Copilot Arena!
November 20, 2024 at 6:59 AM