SWE-smith icon Our new paper SWE-smith is out! Train your own models for software engineering agents.

Select tags...
Model % Resolved Org Date Logs Trajs Site
SWE-agent 1.0 (Claude 3.7 Sonnet)
33.83 - 2025-02-27
Amazon Q Developer Agent (v20241202-dev)
29.99 2025-01-31
OpenHands + CodeAct v2.1 (claude-3-5-sonnet-20241022)
29.38 2024-11-03
AutoCodeRover-v2.0 (Claude-3.5-Sonnet-20241022)
24.89 2024-11-21
Honeycomb
22.06 2024-08-20
Amazon Q Developer Agent (v20240719-dev)
19.75 2024-07-21
Factory Code Droid
19.27 2024-06-17 -
AutoCodeRover (v20240620) + GPT 4o (2024-05-13)
18.83 2024-06-28 -
SWE-agent + Claude 3.5 Sonnet
18.13 2024-06-20 -
AppMap Navie + GPT 4o (2024-05-13)
14.60 2024-06-15 -
Amazon Q Developer Agent (v20240430-dev)
13.82 2024-05-09 -
SWE-agent + GPT 4 (1106)
12.47 2024-04-02
SWE-agent + GPT 4o (2024-05-13)
11.99 2024-07-28
SWE-agent + Claude 3 Opus
10.51 2024-04-02 -
RAG + Claude 3 Opus
3.79 2024-04-02 -
RAG + Claude 2
1.96 2023-10-10 - -
RAG + GPT 4 (1106)
1.31 2024-04-02 - -
RAG + SWE-Llama 13B
0.70 2023-10-10 - -
RAG + SWE-Llama 7B
0.70 2023-10-10 - -
RAG + ChatGPT 3.5
0.17 2023-10-10 - -
No entries match the selected filters. Try adjusting your filters.
Model % Resolved Org Date Logs Trajs Site
🆕
OpenHands
65.80 2025-04-15
Augment Agent v0
65.40 2025-03-16
🆕
Amazon Q Developer Agent (v20250405-dev)
65.40 2025-04-05
W&B Programmer O1 crosscheck5
64.60 2025-01-17
🆕
PatchPilot-v1.1
64.60 2025-05-03 -
AgentScope
63.40 2025-02-06
Tools + Claude 3.7 Sonnet (2025-02-24)
63.20 2025-02-24
Blackbox AI Agent
62.80 - 2025-01-10
EPAM AI/Run Developer Agent v20250219 + Anthopic Claude 3.5 Sonnet
62.80 2025-02-28
SWE-agent + Claude 3.7 Sonnet w/ Review Heavy
62.40 2025-02-25
CodeStory Midwit Agent + swe-search
62.20 - 2024-12-21
OpenHands + 4x Scaled (2024-02-03)
60.80 2025-02-03
Learn-by-interact
60.20 2025-01-10
🆕
CORTEXA
58.20 2025-04-10
devlo
58.20 2024-12-13
Emergent E1 (v2024-12-23)
57.20 2024-12-23
Gru(2024-12-08)
57.00 2024-12-08
EPAM AI/Run Developer Agent v20241212 + Anthopic Claude 3.5 Sonnet
55.40 2024-12-12
Amazon Q Developer Agent (v20241202-dev)
55.00 2024-12-02
devlo
54.20 2024-11-08
Bracket.sh
53.20 2025-01-20
OpenHands + CodeAct v2.1 (claude-3-5-sonnet-20241022)
53.00 2024-10-29
Google Jules + Gemini 2.0 Flash (v20241212-experimental)
52.20 2024-12-12
Engine Labs (2024-11-25)
51.80 2024-11-25
AutoCodeRover-v2.1 (Claude-3.5-Sonnet-20241022)
51.60 2025-01-22 - -
Agentless-1.5 + Claude-3.5 Sonnet (20241022)
50.80 2024-12-02
Solver (2024-10-28)
50.00 2024-10-28
Bytedance MarsCode Agent
50.00 2024-11-25
nFactorial (2024-11-05)
49.20 2024-11-05
Tools + Claude 3.5 Sonnet (2024-10-22)
49.00 2024-10-22
Composio SWE-Kit (2024-10-25)
48.60 2024-10-25
AppMap Navie v2
47.20 2024-11-06
Emergent E1 (v2024-10-12)
46.60 2024-10-23
AutoCodeRover-v2.0 (Claude-3.5-Sonnet-20241022)
46.20 2024-11-08
Solver (2024-09-12)
45.40 2024-09-24
Gru(2024-08-24)
45.20 2024-08-24
CodeShellAgent + Gemini 2.0 Flash (Experimental)
44.20 2025-01-18
Solver (2024-09-12)
43.60 2024-09-20
Agentless Lite + O3 Mini (20250214)
42.40 2025-02-14
ugaiforge
41.60 - 2025-01-12 -
nFactorial (2024-10-30)
41.60 2024-10-30
SWE-RL (Llama3-SWE-RL-70B + Agentless Mini) (20250226)
41.20 2025-02-26
Nebius AI Qwen 2.5 72B Generator + LLama 3.1 70B Critic
40.60 2024-11-13
Tools + Claude 3.5 Haiku
40.60 2024-10-22
Honeycomb
40.60 2024-08-20
Composio SWEkit + Claude 3.5 Sonnet (2024-10-16)
40.60 2024-10-16
🆕
SWE-agent + SWE-agent-LM-32B
40.20 2025-05-11
EPAM AI/Run Developer Agent v20241029 + Anthopic Claude 3.5 Sonnet
39.60 2024-10-29
Amazon Q Developer Agent (v20240719-dev)
38.80 2024-07-21
Agentless-1.5 + GPT 4o (2024-05-13)
38.80 2024-10-28
AutoCodeRover (v20240620) + GPT 4o (2024-05-13)
38.40 2024-06-28 -
Factory Code Droid
37.00 2024-06-17 -
SWE-agent + Claude 3.5 Sonnet
33.60 2024-06-20
SWE-Fixer (Qwen2.5-7b retriever + Qwen2.5-72b editor)
32.80 2025-03-06
MASAI + GPT 4o (2024-06-12)
32.60 - 2024-06-12
Artemis Agent v1 (2024-11-20)
32.00 2024-11-20
nFactorial (2024-10-07)
31.60 2024-10-07
SWE-Fixer (Qwen2.5-7b retriever + Qwen2.5-72b editor) 20241128
30.20 2024-11-28
Lingma Agent + Lingma SWE-GPT 72b (v0925)
28.80 2024-10-02
EPAM AI/Run Developer Agent + GPT4o
27.00 2024-10-16
AppMap Navie + GPT 4o (2024-05-13)
26.20 2024-06-15 -
nFactorial (2024-10-01)
25.80 2024-10-01
Amazon Q Developer Agent (v20240430-dev)
25.60 2024-05-09 -
Lingma Agent + Lingma SWE-GPT 72b (v0918)
25.00 2024-09-18
EPAM AI/Run Developer Agent + GPT4o
24.00 2024-08-20
SWE-agent + GPT 4o (2024-05-13)
23.20 2024-07-28
SWE-agent + GPT 4 (1106)
22.40 2024-04-02
SWE-agent + Claude 3 Opus
18.20 2024-04-02
Lingma Agent + Lingma SWE-GPT 7b (v0925)
18.20 2024-10-02
Lingma Agent + Lingma SWE-GPT 7b (v0918)
10.20 2024-09-18
RAG + Claude 3 Opus
7.00 2024-04-02 -
RAG + Claude 2
4.40 2023-10-10 - -
RAG + GPT 4 (1106)
2.80 2024-04-02 - -
RAG + SWE-Llama 7B
1.40 2023-10-10 - -
RAG + SWE-Llama 13B
1.20 2023-10-10 - -
RAG + ChatGPT 3.5
0.40 2023-10-10 - -
No entries match the selected filters. Try adjusting your filters.
Model % Resolved Org Date Logs Trajs Site
Isoform
55.00 2025-01-14
Blackbox AI Agent
49.00 - 2024-12-20
Gru(2024-12-08)
48.67 2024-12-08
Globant Code Fixer Agent
48.33 2024-11-27
devlo
47.33 2024-11-22
DARS Agent
47.00 2025-02-05
Kodu-v1 + Claude-3.5 Sonnet (20241022)
44.67 2024-12-07
CodeStory Aide + Mixed Models
43.00 - 2024-07-02 -
🆕
Lingxi
42.67 2025-05-09
OpenHands + CodeAct v2.1 (claude-3-5-sonnet-20241022)
41.67 2024-10-25
PatchKitty-0.9 + Claude-3.5 Sonnet (20241022)
41.33 2024-12-20 -
OrcaLoca + Agentless-1.5 + Claude-3.5 Sonnet (20241022)
41.00 2025-01-13 - -
Composio SWE-Kit (2024-10-30)
41.00 2024-10-30
Agentless-1.5 + Claude-3.5 Sonnet (20241022)
40.67 2024-12-02
OpenCSG Starship Agentic Coder + GPT 4 (0806)
39.67 2025-01-13
Bytedance MarsCode Agent
39.33 2024-09-12
Moatless Tools + Claude 3.5 Sonnet (20241022)
39.00 - 2025-01-14
Moatless Tools + Claude 3.5 Sonnet (20241022)
38.33 - 2024-11-17
Honeycomb
38.33 2024-08-20
AbanteAI MentatBot + GPT 4o (2024-05-13)
38.00 - 2024-06-27 -
Patched.Codes Patchwork
37.00 2025-01-04
AppMap Navie v2
36.00 2024-11-13
CodeFuse-AAIS
35.67 2025-01-04
Gru(2024-08-11)
35.67 2024-08-11
Isoform
35.00 - 2024-08-29
SuperCoder2.0
34.00 2024-08-06
Bytedance MarsCode Agent + GPT 4o (2024-05-13)
34.00 2024-07-23 -
Alibaba Lingma Agent
33.00 2024-06-22
Agentless Lite + O3 Mini (20250214)
32.33 2025-02-14 - -
Agentless-1.5 + GPT 4o (2024-05-13)
32.00 2024-10-28 - -
Factory Code Droid
31.33 2024-06-17 -
CodeShellTester + GPT 4o (2024-05-13)
31.33 2024-11-11
AutoCodeRover (v20240620) + GPT 4o (2024-05-13)
30.67 2024-06-21
Aegis - o3-mini_1.0
30.33 - 2025-02-07
AIGCode Infant-Coder(2024-08-30)
30.00 - 2024-09-08
Kortix AI (claude-3-5-sonnet-20241022)
30.00 2024-12-03
Amazon Q Developer Agent (v20240719-dev)
29.67 2024-07-21
Agentless + RepoGraph + GPT-4o
29.67 2024-08-08
CodeR + GPT 4 (1106)
28.33 2024-06-04
reproducedRG
28.00 - 2024-11-17 -
SIMA + GPT 4o (2024-05-13)
27.67 - 2024-07-06
MASAI + GPT 4o (2024-05-13)
27.33 - 2024-06-12
Agentless + GPT 4o (2024-05-13)
27.33 2024-06-30 -
Moatless Tools + Claude 3.5 Sonnet
26.67 - 2024-06-23
OpenHands + CodeAct v1.8
26.67 2024-07-25
IBM Research Agent-101
26.67 2024-06-12 -
Aider + GPT 4o & Claude 3 Opus
26.33 2024-05-23 -
HyperAgent
25.33 - 2024-09-25
SWE-Fixer (Qwen2.5-7b retriever + Qwen2.5-72b editor)
24.67 2025-03-06
Moatless Tools + GPT 4o (2024-05-13)
24.67 - 2024-06-17
IBM AI Agent SWE-1.0 (with open LLMs)
23.67 2024-10-16
OpenCSG StarShip CodeGenAgent + GPT 4 (0613)
23.67 - 2024-05-24 -
SWE-Fixer (Qwen2.5-7b retriever + Qwen2.5-72b editor) 20241128
23.33 2024-11-28
SWE-agent + Claude 3.5 Sonnet
23.00 2024-06-20
AppMap Navie + GPT 4o (2024-05-13)
21.67 2024-06-15 -
Bytedance AutoSE (based on SWE-Agent) + GPT4/GPT4o Mixed (20240828)
21.67 2024-08-28 -
Amazon Q Developer Agent (v20240430-dev)
20.33 2024-05-09 -
AutoCodeRover (v20240408) + GPT 4 (0125)
19.00 2024-05-30 -
SWE-agent + GPT 4o (2024-05-13)
18.33 2024-07-28
SWE-agent + GPT 4 (1106)
18.00 2024-04-02
SWE-agent + Claude 3 Opus
11.67 2024-04-02
RAG + Claude 3 Opus
4.33 2024-04-02 -
RAG + Claude 2
3.00 2023-10-10
RAG + GPT 4 (1106)
2.67 2024-04-02 -
RAG + SWE-Llama 7B
1.33 2023-10-10 -
RAG + SWE-Llama 13B
1.00 2023-10-10 -
RAG + ChatGPT 3.5
0.33 2023-10-10 -
Moatless Tools + Deepseek V3
0.00 - 2025-01-11
No entries match the selected filters. Try adjusting your filters.
Model % Resolved Org Date Logs Trajs Site
🆕
Zencoder (2025-04-01)
30.56 2025-04-01 - -
Globant Code Fixer Agent
29.59 2025-03-25 - -
Zencoder (2025-03-10)
27.08 2025-03-11 - -
Agentless Lite + Claude-3.5 Sonnet
25.34 2025-02-26 - -
SWE-agent Multimodal + GPT 4o (2024-08-06)
12.19 2024-10-06 - -
SWE-agent + Claude Sonnet 3.5
12.19 2024-10-06 - -
SWE-agent JavaScript + Claude Sonnet 3.5
11.99 2024-10-06 - -
SWE-agent + GPT 4o (2024-08-06)
11.99 2024-10-06 - -
SWE-agent Multimodal + Claude 3.5 Sonnet
11.41 2024-10-06 - -
SWE-agent JavaScript + GPT 4o (2024-08-06)
9.28 2024-10-06 - -
Agentless + Claude 3.5 Sonnet
6.19 2024-10-06 - -
RAG + GPT 4o (2024-08-06)
6.00 2024-10-06 - -
RAG + Claude 3.5 Sonnet
5.03 2024-10-06 - -
Agentless + GPT 4o (2024-08-06)
3.09 2024-10-06 - -
No entries match the selected filters. Try adjusting your filters.

SWE-bench Lite is a subset curated for less costly evaluation [Post].
SWE-bench Verified is a human-filtered subset [Post].
SWE-bench Multimodal features issues with visual elements [Post].

Each entry reports the % Resolved metric, the percentage of instances solved (out of 2294 Full, 500 Verified, 300 Lite, 517 Multimodal).

News

  • [05/2025] SWE-smith iconOur new paper SWE-smith is out! Train your own models for software engineering agents.
  • [03/2025] SWE-agent iconSWE-agent 1.0 is the open source SOTA on SWE-bench Lite!
  • [10/2024] Introducing SWE-bench Multimodal! [Link]
  • [08/2024] SWE-bench x OpenAI = SWE-bench Verified [Report]
  • [06/2024] Docker-ized SWE-bench for easier evaluation [Report]
  • [03/2024] Check out SWE-agent (12.47% on SWE-bench) [Link]
  • [03/2024] Released SWE-bench Lite [Report]

Acknowledgements

We thank the following institutions for their generous support: Open Philanthropy, AWS, Modal, Andreessen Horowitz, OpenAI, and Anthropic.