Lightnews — Scholar-powered news

Refact.ai

@refact-ai.bsky.social

🖇Explore the technical details of our setup: refact.ai/blog/2025/op...

Try refact.ai Agent — SOTA on SWE-bench Verified — in your IDE, today:

• VS Code: marketplace.visualstudio.com/items?itemNa...
• JetBrains: plugins.jetbrains.com/plugin/20647...

Refact.ai is the #1 open-source AI Agent on SWE-bench Verified with a 69.8% score

refact.ai

May 22, 2025 at 8:18 PM

Refact.ai

@refact-ai.bsky.social

What makes Refact.ai special isn’t just the score — it’s our end-to-end approach.

We build for real-world results, not just leaderboards.

Delegate your everyday programming tasks to our AI Agent, preview every step, and guide the process whenever you like ❤️

May 22, 2025 at 8:18 PM

Refact.ai

@refact-ai.bsky.social

Before SWE-bench Verified, we applied lessons from our SOTA SWE-bench Lite run:

• Made tools more tolerant of the model’s uncertainty
• Renamed them for clarity: definition()→search_symbol_definition(), etc.
• Reduced chat compression
❌Dropped multi-step planning
• & more

May 22, 2025 at 8:18 PM

Refact.ai

@refact-ai.bsky.social

🧠The strategic_planning() tool (powered by o3) steps in when deeper reasoning is needed.

It analyzes the debug_script() report, brainstorms the solution, and applies fixes directly — no patches or diffs.

One mandatory call per task, lean and focused.

May 22, 2025 at 8:18 PM

Refact.ai

@refact-ai.bsky.social

AI Agent needed to be more reliable to solve SWE-bench in pass@1.

🛡️We added automatic guardrails:
A script runs static checks on model outputs. If it detects Agent going off track, it injects mid-run helper messages (as from a “user”) to nudge it back in the right direction.

May 22, 2025 at 8:18 PM

Refact.ai

@refact-ai.bsky.social

We introduce a new sub-agent — debug_script().

It uses pdb to debug, modify, and generate scripts, gathering:
1. Which files are affected
2. What caused the failure
3. How it might be fixed.

We forced at least 1 and up to 3 calls per task.

May 22, 2025 at 8:18 PM

Refact.ai

@refact-ai.bsky.social

Let's break down what powered our result.

Models:
• Orchestration: Claude 3.7
• debug_script(): Claude 3.7 + o4-mini
• strategic_planning(): o3
• Temp: 0 for Claude

For each benchmark task, our AI Agent made one multi-step run to produce a single, correct final solution.

May 22, 2025 at 8:18 PM

Refact.ai

@refact-ai.bsky.social

As Refact.ai Agent is open-source, we made our full SWE-bench Verified pipeline live on GitHub.

You can run it end-to-end and reproduce our Agent’s approach and 69.8% score.

➡️ github.com/smallcloudai...

GitHub - smallcloudai/refact-bench: A benchmarking tool for evaluating AI coding assistants on real-world software engineering tasks from the SWE-Bench dataset.

A benchmarking tool for evaluating AI coding assistants on real-world software engineering tasks from the SWE-Bench dataset. - smallcloudai/refact-bench

github.com

May 22, 2025 at 8:18 PM

Refact.ai

@refact-ai.bsky.social

🙌Try Refact.ai Agent for programming in your IDE today:

• VS Code: marketplace.visualstudio.com/items?itemNa...

• JetBrains: plugins.jetbrains.com/plugin/20647...

May 7, 2025 at 9:56 AM

Refact.ai

@refact-ai.bsky.social

🧑‍💻Refact.ai's score of 59.7% is more than just a number — it demonstrates what our AI Agent brings to your everyday programming:

• Automates repetitive tasks
• Understands large codebases
• Integrates with GitHub, Docker, PostgreSQL, & more +
1000+ dev tools via MCP
• Learns from every interaction

May 7, 2025 at 9:56 AM

Refact.ai

@refact-ai.bsky.social

Like skilled developers, our Agent knows when to dive deeper and call tools.

🧠It uses deep_analysis() for reasoning in complex tasks: Solution generation→Critique→Refinement.

Refact.ai decides when tools are needed, creating custom strategies instead of following scripts.

May 7, 2025 at 9:56 AM

Refact.ai

@refact-ai.bsky.social

🪄How it approaches SWE-bench tasks [prompt]:

1. Understand the problem
2. Investigate the repo
3. Create & run the problem reproduction script
4. Plan & implement changes (applying reasoning)
5. Test & evaluate changes (incl. optional reasoning)
6. Repeat steps 4 and 5 until the problem is solved.

May 7, 2025 at 9:56 AM

Refact.ai

@refact-ai.bsky.social

Autonomy = our core strength.

AI Agent completes the entire dev workflow on its own: plans, executes, tests, self-corrects, and delivers a production-ready result.

For each task, it makes 1️⃣ multi-step run to generate a single correct solution through thoughtful iteration.

May 7, 2025 at 9:56 AM

Refact.ai

@refact-ai.bsky.social

Our mission:
Empower every dev with an Autonomous AI Agent that amplifies their capabilities & helps achieve 10x more.

✨Refact.ai is open-source: we believe coding tools should be transparent, customizable, and community-driven — building the future of programming together:
github.com/smallcloudai

May 7, 2025 at 9:56 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news