We explore this shift in Programming with Pixels: an agent environment where agents interact with VS Code to perform a wide variety of software engineering tasks
Code/agent environment: github.com/Programmingw...
Homepage: programmingwithpixels.com
Introducing Programming with Pixels: an SWE environment where agents control VSCode via screen perception, typing & clicking to tackle diverse tasks.
programmingwithpixels.com
🧵
We explore this shift in Programming with Pixels: an agent environment where agents interact with VS Code to perform a wide variety of software engineering tasks
Code/agent environment: github.com/Programmingw...
Homepage: programmingwithpixels.com
Introducing Programming with Pixels: an SWE environment where agents control VSCode via screen perception, typing & clicking to tackle diverse tasks.
programmingwithpixels.com
🧵
Introducing Programming with Pixels: an SWE environment where agents control VSCode via screen perception, typing & clicking to tackle diverse tasks.
programmingwithpixels.com
🧵
paper: arxiv.org/abs/2408.00724
code: github.com/thu-wyz/infe...
We study compute-optimal inference, develop a tree search with process reward models (REBASE), and find that smaller models often outperform larger ones
paper: arxiv.org/abs/2408.00724
code: github.com/thu-wyz/infe...
We study compute-optimal inference, develop a tree search with process reward models (REBASE), and find that smaller models often outperform larger ones
AlphaVerus generates Rust code that is provably correct via a new combination of tree search and refinement, along with a self-improvement loop that improves its capabilities over time
Instead, what if they can prove code correctness?
Presenting AlphaVerus: A self-reinforcing method that automatically learns to generate correct code using inference-time search and verifier feedback.
🌐 : alphaverus.github.io
🧵
AlphaVerus generates Rust code that is provably correct via a new combination of tree search and refinement, along with a self-improvement loop that improves its capabilities over time
@uwnlp.bsky.social & Ai2
With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts.
Try out our demo!
openscholar.allen.ai
@uwnlp.bsky.social & Ai2
With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts.
Try out our demo!
openscholar.allen.ai
simons.berkeley.edu/talks/sean-w...
It was a sneak-preview subset of our NeurIPS tutorial:
cmu-l3.github.io/neurips2024-...
simons.berkeley.edu/talks/sean-w...
It was a sneak-preview subset of our NeurIPS tutorial:
cmu-l3.github.io/neurips2024-...