SWE-bench-Live

Evaluating your AI system on latest software engineering tasks.

About

SWE-bench-Live is a live benchmark for issue resolving, designed to evaluate an AI system's ability to complete real-world software engineering tasks. Thanks to our automated dataset curation pipeline, we plan to update SWE-bench-Live on a monthly basis to provide the community with up-to-date task instances and support rigorous and contamination-free evaluation.

Note: If you think your repository is not suitable to be included in our benchmark, please contact us to remove it.

News

Dec 2025

Multi Language and OS update

We upgraded RepoLaunch Agent to support building repos on all mainstram languages (C C++ C# Python Java Go JS/TS Rust) and on both Linux&Windows platforms. The MultiLang benchmark has been released on HuggingFace. On the leaderboard below, Lite, Full and Verified splits are still for Python tasks only.

Aug 2025

Dataset update (through Aug 2025)

We've finalized the update process for SWE-bench-Live: Each month, we will add 50 newly verified, high-quality issues to the dataset. The lite and verified splits will remain frozen, ensuring fair leaderboard comparisons and keeping evaluation costs manageable. To access the latest issues, please refer to the full split!

Jun 2025

Dataset update

We've updated the dataset! Now it includes 1,565 task instances, covering 164 repositories.

Leaderboard

- results - instances
Rank Method Resolved Date

Loading leaderboard data...

Submit your results

We coordinate results submission via Pull Requests, see SWE-bench-Live/submissions for instructions.

Correspondence

Corresponding to [email protected]

GitHub Copilot Team, Microsoft US is actively hiring FTE/Interns
DKI Group, Microsoft Shanghai is actively hiring Interns

We welcome external part-time open-source collaborators to join us to update our dataset tasks each month.

Acknowledgement

SWE-bench-Live is built upon the foundation of SWE-bench. We extend our gratitude to the original SWE-bench team for their pioneering work in software engineering evaluation benchmarks.

Citation

If you use SWE-bench-Live in your research, please cite:

@article{zhang2025swebenchgoeslive,
  title={SWE-bench Goes Live!},
  author={Linghao Zhang and Shilin He and Chaoyun Zhang and Yu Kang and Bowen Li and Chengxing Xie and Junhao Wang and Maoquan Wang and Yufan Huang and Shengyu Fu and Elsie Nallipogu and Qingwei Lin and Yingnong Dang and Saravan Rajmohan and Dongmei Zhang},
  journal={arXiv preprint arXiv:2505.23419},
  year={2025}
}