Sure, it's only Python, SWE life is more than bugfixing, not all code can be tested...
But.
A benchmark is really a model of your target use case. And, as we know, all models are wrong but some are useful 😉
Sure, it's only Python, SWE life is more than bugfixing, not all code can be tested...
But.
A benchmark is really a model of your target use case. And, as we know, all models are wrong but some are useful 😉
I am thrilled about ✨ Gemini 2.0 Flash as it allowed us to build the next generation of Code Agents experience: developers.googleblog.com/en/the-next-...
I am thrilled about ✨ Gemini 2.0 Flash as it allowed us to build the next generation of Code Agents experience: developers.googleblog.com/en/the-next-...
For the NeurIPS week, they should've replaced this ⭐ with a ✨ lol.
For the NeurIPS week, they should've replaced this ⭐ with a ✨ lol.
✨ Stoked to chat about Gemini, code/SWE agents, and whether our industry is doomed to obsolete ourselves.
✨ Stoked to chat about Gemini, code/SWE agents, and whether our industry is doomed to obsolete ourselves.
Well, as good a time for an intro as any 😅
Hello world! I'm Alex. In no particular order:
• research scientist at Google DeepMind
• Gemini SWE Agents co-lead
• Ukrainian
• New Yorker
• movie nerd
Happy to try again on a new forum. Maybe it'll feel like 2019 again 😊
Well, as good a time for an intro as any 😅
Hello world! I'm Alex. In no particular order:
• research scientist at Google DeepMind
• Gemini SWE Agents co-lead
• Ukrainian
• New Yorker
• movie nerd
Happy to try again on a new forum. Maybe it'll feel like 2019 again 😊