All official submissions to the SWE-bench leaderboard are maintained at  SWE-bench/experiments

Evaluating on SWE-bench

Check out the main SWE-bench repository docs for instructions on how to generate and evaluate predictions on SWE-bench [Lite, Verified, Multimodal].

SWE-bench evaluation can be carried out either locally or via cloud compute platforms with our sb-cli tool (Recommended) or Modal.

For SWE-bench Multimodal, SWE-bench evaluation can only be carried out via sb-cli.

Submit to Leaderboard

If you are interested in submitting your system or model to any of our leaderboards (SWE-bench [Lite, Verified, Multimodal, Multilingual]), please follow the instructions posted at SWE-bench/experiments.