We generated 50k+ task instances with SWE-smith to train SWE-agent-LM-32B (open-weight SotA on Verified). More in the paper!

Select tags...
Model % Resolved Org Date Logs Trajs Site
SWE-agent 1.0 (Claude 3.7 Sonnet)
33.83 - 2025-02-27
Amazon Q Developer Agent (v20241202-dev)
29.99
2025-01-31
OpenHands + CodeAct v2.1 (claude-3-5-sonnet-20241022)
29.38
2024-11-03
AutoCodeRover-v2.0 (Claude-3.5-Sonnet-20241022)
24.89
2024-11-21
Honeycomb
22.06
2024-08-20
Amazon Q Developer Agent (v20240719-dev)
19.75
2024-07-21
Factory Code Droid
19.27
2024-06-17 -
AutoCodeRover (v20240620) + GPT 4o (2024-05-13)
18.83
2024-06-28 -
SWE-agent + Claude 3.5 Sonnet
18.13
2024-06-20 -
AppMap Navie + GPT 4o (2024-05-13)
14.60
2024-06-15 -
Amazon Q Developer Agent (v20240430-dev)
13.82
2024-05-09 -
SWE-agent + GPT 4 (1106)
12.47
2024-04-02
SWE-agent + GPT 4o (2024-05-13)
11.99
2024-07-28
SWE-agent + Claude 3 Opus
10.51
2024-04-02 -
RAG + Claude 3 Opus
3.79
2024-04-02 -
RAG + Claude 2
1.96
2023-10-10 - -
RAG + GPT 4 (1106)
1.31
2024-04-02 - -
RAG + SWE-Llama 13B
0.70
2023-10-10 - -
RAG + SWE-Llama 7B
0.70
2023-10-10 - -
RAG + ChatGPT 3.5
0.17
2023-10-10 - -
No entries match the selected filters. Try adjusting your filters.
Model % Resolved Org Date Logs Trajs Site
🆕
Nemotron-CORTEXA
68.20
2025-05-16
🆕
Aime-coder v1 + Anthopic Claude 3.7 Sonnet
66.40
2025-05-14
🆕
OpenHands
65.80
2025-04-15
Augment Agent v0
65.40
2025-03-16
🆕
Amazon Q Developer Agent (v20250405-dev)
65.40
2025-04-05
W&B Programmer O1 crosscheck5
64.60
2025-01-17
🆕
PatchPilot-v1.1
64.60
2025-05-03 -
AgentScope
63.40
2025-02-06
Tools + Claude 3.7 Sonnet (2025-02-24)
63.20
2025-02-24
Blackbox AI Agent
62.80 - 2025-01-10
EPAM AI/Run Developer Agent v20250219 + Anthopic Claude 3.5 Sonnet
62.80
2025-02-28
SWE-agent + Claude 3.7 Sonnet w/ Review Heavy
62.40
2025-02-25
CodeStory Midwit Agent + swe-search
62.20 - 2024-12-21
OpenHands + 4x Scaled (2024-02-03)
60.80
2025-02-03
Learn-by-interact
60.20
2025-01-10
🆕
Nemotron-CORTEXA
58.20
2025-04-10
devlo
58.20
2024-12-13
Emergent E1 (v2024-12-23)
57.20
2024-12-23
Gru(2024-12-08)
57.00
2024-12-08
EPAM AI/Run Developer Agent v20241212 + Anthopic Claude 3.5 Sonnet
55.40
2024-12-12
Amazon Q Developer Agent (v20241202-dev)
55.00
2024-12-02
devlo
54.20
2024-11-08
Bracket.sh
53.20
2025-01-20
OpenHands + CodeAct v2.1 (claude-3-5-sonnet-20241022)
53.00
2024-10-29
Google Jules + Gemini 2.0 Flash (v20241212-experimental)
52.20
2024-12-12
Engine Labs (2024-11-25)
51.80
2024-11-25
AutoCodeRover-v2.1 (Claude-3.5-Sonnet-20241022)
51.60
2025-01-22 - -
Agentless-1.5 + Claude-3.5 Sonnet (20241022)
50.80
2024-12-02
Solver (2024-10-28)
50.00
2024-10-28
Bytedance MarsCode Agent
50.00
2024-11-25
nFactorial (2024-11-05)
49.20
2024-11-05
Tools + Claude 3.5 Sonnet (2024-10-22)
49.00
2024-10-22
Composio SWE-Kit (2024-10-25)
48.60
2024-10-25
AppMap Navie v2
47.20
2024-11-06
🆕
OpenHands + DevStral Small 2505
46.80
2025-05-20
Emergent E1 (v2024-10-12)
46.60
2024-10-23
AutoCodeRover-v2.0 (Claude-3.5-Sonnet-20241022)
46.20
2024-11-08
Solver (2024-09-12)
45.40
2024-09-24
Gru(2024-08-24)
45.20
2024-08-24
CodeShellAgent + Gemini 2.0 Flash (Experimental)
44.20
2025-01-18
Solver (2024-09-12)
43.60
2024-09-20
Agentless Lite + O3 Mini (20250214)
42.40
2025-02-14
ugaiforge
41.60 - 2025-01-12 -
nFactorial (2024-10-30)
41.60
2024-10-30
SWE-RL (Llama3-SWE-RL-70B + Agentless Mini) (20250226)
41.20
2025-02-26
Nebius AI Qwen 2.5 72B Generator + LLama 3.1 70B Critic
40.60
2024-11-13
Tools + Claude 3.5 Haiku
40.60
2024-10-22
Honeycomb
40.60
2024-08-20
Composio SWEkit + Claude 3.5 Sonnet (2024-10-16)
40.60
2024-10-16
🆕
SWE-agent + SWE-agent-LM-32B
40.20
2025-05-11
EPAM AI/Run Developer Agent v20241029 + Anthopic Claude 3.5 Sonnet
39.60
2024-10-29
Amazon Q Developer Agent (v20240719-dev)
38.80
2024-07-21
Agentless-1.5 + GPT 4o (2024-05-13)
38.80
2024-10-28
AutoCodeRover (v20240620) + GPT 4o (2024-05-13)
38.40
2024-06-28 -
Factory Code Droid
37.00
2024-06-17 -
SWE-agent + Claude 3.5 Sonnet
33.60
2024-06-20
SWE-Fixer (Qwen2.5-7b retriever + Qwen2.5-72b editor)
32.80
2025-03-06
MASAI + GPT 4o (2024-06-12)
32.60 - 2024-06-12
Artemis Agent v1 (2024-11-20)
32.00
2024-11-20
nFactorial (2024-10-07)
31.60
2024-10-07
SWE-Fixer (Qwen2.5-7b retriever + Qwen2.5-72b editor) 20241128
30.20
2024-11-28
Lingma Agent + Lingma SWE-GPT 72b (v0925)
28.80
2024-10-02
EPAM AI/Run Developer Agent + GPT4o
27.00
2024-10-16
AppMap Navie + GPT 4o (2024-05-13)
26.20
2024-06-15 -
nFactorial (2024-10-01)
25.80
2024-10-01
Amazon Q Developer Agent (v20240430-dev)
25.60
2024-05-09 -
Lingma Agent + Lingma SWE-GPT 72b (v0918)
25.00
2024-09-18
EPAM AI/Run Developer Agent + GPT4o
24.00
2024-08-20
SWE-agent + GPT 4o (2024-05-13)
23.20
2024-07-28
SWE-agent + GPT 4 (1106)
22.40
2024-04-02
SWE-agent + Claude 3 Opus
18.20
2024-04-02
Lingma Agent + Lingma SWE-GPT 7b (v0925)
18.20
2024-10-02
Lingma Agent + Lingma SWE-GPT 7b (v0918)
10.20
2024-09-18
RAG + Claude 3 Opus
7.00
2024-04-02 -
RAG + Claude 2
4.40
2023-10-10 - -
RAG + GPT 4 (1106)
2.80
2024-04-02 - -
RAG + SWE-Llama 7B
1.40
2023-10-10 - -
RAG + SWE-Llama 13B
1.20
2023-10-10 - -
RAG + ChatGPT 3.5
0.40
2023-10-10 - -
No entries match the selected filters. Try adjusting your filters.
Model % Resolved Org Date Logs Trajs Site
Isoform
55.00
2025-01-14
Blackbox AI Agent
49.00 - 2024-12-20
Gru(2024-12-08)
48.67
2024-12-08
Globant Code Fixer Agent
48.33
2024-11-27
SWE-agent + Claude 3.7 Sonnet
48.00
2025-02-26
devlo
47.33
2024-11-22
DARS Agent
47.00
2025-02-05
Kodu-v1 + Claude-3.5 Sonnet (20241022)
44.67
2024-12-07
CodeStory Aide + Mixed Models
43.00 - 2024-07-02 -
🆕
Lingxi
42.67
2025-05-09
OpenHands + CodeAct v2.1 (claude-3-5-sonnet-20241022)
41.67
2024-10-25
🆕
Codart AI
41.67 - 2025-05-15
PatchKitty-0.9 + Claude-3.5 Sonnet (20241022)
41.33
2024-12-20 -
OrcaLoca + Agentless-1.5 + Claude-3.5 Sonnet (20241022)
41.00
2025-01-13 - -
Composio SWE-Kit (2024-10-30)
41.00
2024-10-30
Agentless-1.5 + Claude-3.5 Sonnet (20241022)
40.67
2024-12-02
OpenCSG Starship Agentic Coder + GPT 4 (0806)
39.67
2025-01-13
Bytedance MarsCode Agent
39.33
2024-09-12
Moatless Tools + Claude 3.5 Sonnet (20241022)
39.00 - 2025-01-14
Moatless Tools + Claude 3.5 Sonnet (20241022)
38.33 - 2024-11-17
Honeycomb
38.33
2024-08-20
AbanteAI MentatBot + GPT 4o (2024-05-13)
38.00 - 2024-06-27 -
Patched.Codes Patchwork
37.00
2025-01-04
AppMap Navie v2
36.00
2024-11-13
CodeFuse-AAIS
35.67
2025-01-04
Gru(2024-08-11)
35.67
2024-08-11
Isoform
35.00 - 2024-08-29
SuperCoder2.0
34.00
2024-08-06
Bytedance MarsCode Agent + GPT 4o (2024-05-13)
34.00
2024-07-23 -
Alibaba Lingma Agent
33.00
2024-06-22
Agentless Lite + O3 Mini (20250214)
32.33
2025-02-14 - -
Agentless-1.5 + GPT 4o (2024-05-13)
32.00
2024-10-28 - -
Factory Code Droid
31.33
2024-06-17 -
CodeShellTester + GPT 4o (2024-05-13)
31.33
2024-11-11
AutoCodeRover (v20240620) + GPT 4o (2024-05-13)
30.67
2024-06-21
Aegis - o3-mini_1.0
30.33 - 2025-02-07
AIGCode Infant-Coder(2024-08-30)
30.00 - 2024-09-08
Kortix AI (claude-3-5-sonnet-20241022)
30.00
2024-12-03
Amazon Q Developer Agent (v20240719-dev)
29.67
2024-07-21
Agentless + RepoGraph + GPT-4o
29.67
2024-08-08
CodeR + GPT 4 (1106)
28.33
2024-06-04
reproducedRG
28.00 - 2024-11-17 -
SIMA + GPT 4o (2024-05-13)
27.67 - 2024-07-06
MASAI + GPT 4o (2024-05-13)
27.33 - 2024-06-12
Agentless + GPT 4o (2024-05-13)
27.33
2024-06-30 -
Moatless Tools + Claude 3.5 Sonnet
26.67 - 2024-06-23
OpenHands + CodeAct v1.8
26.67
2024-07-25
IBM Research Agent-101
26.67
2024-06-12 -
Aider + GPT 4o & Claude 3 Opus
26.33
2024-05-23 -
HyperAgent
25.33 - 2024-09-25
SWE-Fixer (Qwen2.5-7b retriever + Qwen2.5-72b editor)
24.67
2025-03-06
Moatless Tools + GPT 4o (2024-05-13)
24.67 - 2024-06-17
IBM AI Agent SWE-1.0 (with open LLMs)
23.67
2024-10-16
OpenCSG StarShip CodeGenAgent + GPT 4 (0613)
23.67 - 2024-05-24 -
SWE-Fixer (Qwen2.5-7b retriever + Qwen2.5-72b editor) 20241128
23.33
2024-11-28
SWE-agent + Claude 3.5 Sonnet
23.00
2024-06-20
AppMap Navie + GPT 4o (2024-05-13)
21.67
2024-06-15 -
Bytedance AutoSE (based on SWE-Agent) + GPT4/GPT4o Mixed (20240828)
21.67
2024-08-28 -
Amazon Q Developer Agent (v20240430-dev)
20.33
2024-05-09 -
AutoCodeRover (v20240408) + GPT 4 (0125)
19.00
2024-05-30 -
SWE-agent + GPT 4o (2024-05-13)
18.33
2024-07-28
SWE-agent + GPT 4 (1106)
18.00
2024-04-02
SWE-agent + Claude 3 Opus
11.67
2024-04-02
RAG + Claude 3 Opus
4.33
2024-04-02 -
RAG + Claude 2
3.00
2023-10-10
RAG + GPT 4 (1106)
2.67
2024-04-02 -
RAG + SWE-Llama 7B
1.33
2023-10-10 -
RAG + SWE-Llama 13B
1.00
2023-10-10 -
RAG + ChatGPT 3.5
0.33
2023-10-10 -
Moatless Tools + Deepseek V3
0.00 - 2025-01-11
No entries match the selected filters. Try adjusting your filters.
Model % Resolved Org Date Logs Trajs Site
🆕
Zencoder (2025-04-01)
30.56
2025-04-01 - -
Globant Code Fixer Agent
29.59
2025-03-25 - -
Zencoder (2025-03-10)
27.08
2025-03-11 - -
Agentless Lite + Claude-3.5 Sonnet
25.34
2025-02-26 - -
SWE-agent Multimodal + GPT 4o (2024-08-06)
12.19
2024-10-06 - -
SWE-agent + Claude Sonnet 3.5
12.19
2024-10-06 - -
SWE-agent JavaScript + Claude Sonnet 3.5
11.99
2024-10-06 - -
SWE-agent + GPT 4o (2024-08-06)
11.99
2024-10-06 - -
SWE-agent Multimodal + Claude 3.5 Sonnet
11.41
2024-10-06 - -
SWE-agent JavaScript + GPT 4o (2024-08-06)
9.28
2024-10-06 - -
Agentless + Claude 3.5 Sonnet
6.19
2024-10-06 - -
RAG + GPT 4o (2024-08-06)
6.00
2024-10-06 - -
RAG + Claude 3.5 Sonnet
5.03
2024-10-06 - -
Agentless + GPT 4o (2024-08-06)
3.09
2024-10-06 - -
No entries match the selected filters. Try adjusting your filters.

SWE-bench Lite is a subset curated for less costly evaluation [Post].
SWE-bench Verified is a human-filtered subset [Post].
SWE-bench Multimodal features issues with visual elements [Post].

Each entry reports the % Resolved metric, the percentage of instances solved (out of 2294 Full, 500 Verified, 300 Lite, 517 Multimodal).

News

  • [05/2025] SWE-smith iconOur new paper SWE-smith is out! Train your own models for software engineering agents. [Link]
  • [03/2025] SWE-agent iconSWE-agent 1.0 is the open source SOTA on SWE-bench Lite! [Link]
  • [10/2024] Introducing SWE-bench Multimodal! [Link]
  • [08/2024] SWE-bench x OpenAI = SWE-bench Verified [Report]
  • [06/2024] Docker-ized SWE-bench for easier evaluation [Report]
  • [03/2024] Check out SWE-agent (12.47% on SWE-bench) [Link]
  • [03/2024] Released SWE-bench Lite [Report]

Acknowledgements

We thank the following institutions for their generous support: Open Philanthropy, AWS, Modal, Andreessen Horowitz, OpenAI, and Anthropic.