Blog
Latest updates, insights, and announcements from the SWE-bench team.
[SWE-bench Verified] Detecting cheating in submissions
2025-11-19 • by John Yang
How similar are agent solutions to the ground truth?
[mini-SWE-agent] Roulette mode!
2025-08-19 • by Kilian Lieret
Randomly switching between models at every step can boost performance