wellecks.bsky.social
@wellecks.bsky.social
Will future SWE agents be computer-use agents?

We explore this shift in Programming with Pixels: an agent environment where agents interact with VS Code to perform a wide variety of software engineering tasks

Code/agent environment: github.com/Programmingw...

Homepage: programmingwithpixels.com
What if AI agents did software engineering like humans—seeing the screen & using any developer tool?

Introducing Programming with Pixels: an SWE environment where agents control VSCode via screen perception, typing & clicking to tackle diverse tasks.

programmingwithpixels.com

🧵
February 27, 2025 at 1:10 AM
Reposted
What if AI agents did software engineering like humans—seeing the screen & using any developer tool?

Introducing Programming with Pixels: an SWE environment where agents control VSCode via screen perception, typing & clicking to tackle diverse tasks.

programmingwithpixels.com

🧵
February 26, 2025 at 5:17 PM
Big fan of this effort! Also check out our work on Inference Scaling Laws:

paper: arxiv.org/abs/2408.00724
code: github.com/thu-wyz/infe...

We study compute-optimal inference, develop a tree search with process reward models (REBASE), and find that smaller models often outperform larger ones
December 19, 2024 at 7:27 PM
Check out our new work on grounding code generation with formal verification!

AlphaVerus generates Rust code that is provably correct via a new combination of tree search and refinement, along with a self-improvement loop that improves its capabilities over time
LLMs often generate incorrect code.

Instead, what if they can prove code correctness?

Presenting AlphaVerus: A self-reinforcing method that automatically learns to generate correct code using inference-time search and verifier feedback.

🌐 : alphaverus.github.io

🧵
December 19, 2024 at 7:12 PM
Reposted
I’m proud of this tikz drawing I made today for our upcoming NeurIPS tutorial on decoding (our paper: arxiv.org/abs/2406.16838)
November 14, 2024 at 5:02 AM
Reposted
1/ Introducing ᴏᴘᴇɴꜱᴄʜᴏʟᴀʀ: a retrieval-augmented LM to help scientists synthesize knowledge 📚
@uwnlp.bsky.social & Ai2
With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts.
Try out our demo!
openscholar.allen.ai
November 19, 2024 at 4:30 PM
I was honored to give a talk at Simons Institute on inference-time algorithms and meta-generation!

simons.berkeley.edu/talks/sean-w...

It was a sneak-preview subset of our NeurIPS tutorial:
cmu-l3.github.io/neurips2024-...
November 21, 2024 at 9:44 PM