Refact.ai
banner
refact-ai.bsky.social
Refact.ai
@refact-ai.bsky.social
Open-source AI Agent that solves engineering tasks end-to-end, fully autonomously.

Get for VS and JetBrains: https://linktr.ee/refactai
🖇Explore the technical details of our setup: refact.ai/blog/2025/op...

Try refact.ai Agent — SOTA on SWE-bench Verified — in your IDE, today:

• VS Code: marketplace.visualstudio.com/items?itemNa...
• JetBrains: plugins.jetbrains.com/plugin/20647...
Refact.ai is the #1 open-source AI Agent on SWE-bench Verified with a 69.8% score
Refact.ai is the #1 open-source AI Agent on SWE-bench Verified with a 69.8% score
refact.ai
May 22, 2025 at 8:18 PM
What makes Refact.ai special isn’t just the score — it’s our end-to-end approach.

We build for real-world results, not just leaderboards.

Delegate your everyday programming tasks to our AI Agent, preview every step, and guide the process whenever you like ❤️
May 22, 2025 at 8:18 PM
Before SWE-bench Verified, we applied lessons from our SOTA SWE-bench Lite run:

• Made tools more tolerant of the model’s uncertainty
• Renamed them for clarity: definition()→search_symbol_definition(), etc.
• Reduced chat compression
❌Dropped multi-step planning
• & more
May 22, 2025 at 8:18 PM
🧠The strategic_planning() tool (powered by o3) steps in when deeper reasoning is needed.

It analyzes the debug_script() report, brainstorms the solution, and applies fixes directly — no patches or diffs.

One mandatory call per task, lean and focused.
May 22, 2025 at 8:18 PM
AI Agent needed to be more reliable to solve SWE-bench in pass@1.

🛡️We added automatic guardrails:
A script runs static checks on model outputs. If it detects Agent going off track, it injects mid-run helper messages (as from a “user”) to nudge it back in the right direction.
May 22, 2025 at 8:18 PM
We introduce a new sub-agent — debug_script().

It uses pdb to debug, modify, and generate scripts, gathering:
1. Which files are affected
2. What caused the failure
3. How it might be fixed.

We forced at least 1 and up to 3 calls per task.
May 22, 2025 at 8:18 PM
Let's break down what powered our result.

Models:
• Orchestration: Claude 3.7
• debug_script(): Claude 3.7 + o4-mini
• strategic_planning(): o3
• Temp: 0 for Claude

For each benchmark task, our AI Agent made one multi-step run to produce a single, correct final solution.
May 22, 2025 at 8:18 PM
As Refact.ai Agent is open-source, we made our full SWE-bench Verified pipeline live on GitHub.

You can run it end-to-end and reproduce our Agent’s approach and 69.8% score.

➡️ github.com/smallcloudai...
GitHub - smallcloudai/refact-bench: A benchmarking tool for evaluating AI coding assistants on real-world software engineering tasks from the SWE-Bench dataset.
A benchmarking tool for evaluating AI coding assistants on real-world software engineering tasks from the SWE-Bench dataset. - smallcloudai/refact-bench
github.com
May 22, 2025 at 8:18 PM
🙌Try Refact.ai Agent for programming in your IDE today:

• VS Code: marketplace.visualstudio.com/items?itemNa...

• JetBrains: plugins.jetbrains.com/plugin/20647...
May 7, 2025 at 9:56 AM
🧑‍💻Refact.ai's score of 59.7% is more than just a number — it demonstrates what our AI Agent brings to your everyday programming:

• Automates repetitive tasks
• Understands large codebases
• Integrates with GitHub, Docker, PostgreSQL, & more +
1000+ dev tools via MCP
• Learns from every interaction
May 7, 2025 at 9:56 AM
Like skilled developers, our Agent knows when to dive deeper and call tools.

🧠It uses deep_analysis() for reasoning in complex tasks: Solution generation→Critique→Refinement.

Refact.ai decides when tools are needed, creating custom strategies instead of following scripts.
May 7, 2025 at 9:56 AM
🪄How it approaches SWE-bench tasks [prompt]:

1. Understand the problem
2. Investigate the repo
3. Create & run the problem reproduction script
4. Plan & implement changes (applying reasoning)
5. Test & evaluate changes (incl. optional reasoning)
6. Repeat steps 4 and 5 until the problem is solved.
May 7, 2025 at 9:56 AM
Autonomy = our core strength.

AI Agent completes the entire dev workflow on its own: plans, executes, tests, self-corrects, and delivers a production-ready result.

For each task, it makes 1️⃣ multi-step run to generate a single correct solution through thoughtful iteration.
May 7, 2025 at 9:56 AM
Our mission:
Empower every dev with an Autonomous AI Agent that amplifies their capabilities & helps achieve 10x more.

✨Refact.ai is open-source: we believe coding tools should be transparent, customizable, and community-driven — building the future of programming together:
github.com/smallcloudai
May 7, 2025 at 9:56 AM