This repo hosts model results, trajectories, and evaluation logs on SWE-bench-Live. We coordinate result submissions via Pull Requests.
We provided the trajectories from the experiments conducted in the paper, see this link. For third-party submitted trajectories and logs, please refer to the corresponding submission directory in this repository. We temporarily host these files directly on GitHub and recommend using sparse checkout to only checkout the directory contents you care about.
Thank you for your interest in submitting results to SWE-bench-Live. We are currently following the submission process outlined below.
-
Clone a fork of the repository, consider using
git clone --depth 1 --filter=blob:none --sparseto speed up the process. -
In the folder corresponding to your evaluated subset (
submissions/{subset}), create a new folder named in the format:YYYYMMDD-{YOUR_METHOD_NAME}E.g.20250501-sweagent-claude37. -
Place your predictions file in
preds.json, which should include the patch for each instance. Place the evaluation report generated by the SWE-bench-Live evaluation script inresults.json. -
Optionally, create a
logsfolder to store logs from the evaluation process, and atrajsfolder to store reasoning trajectories that reflect how your system solved the problems. -
Create a
READMEto explain the agent scaffold you used and the experimental setting, including the number of rollouts, how results were sampled, the number of iterations, and other relevant details. -
Create a pull request to the
SWE-bench-Live/submissionsrepository with the new submission folder.
For any issues encountered during the submission process, please open an issue in the repository.