Skip to content

Conversation

@timesler
Copy link
Contributor

Thank you for your great work putting together this benchmark and the leaderboard!

This PR submits benchmark results for the Amazon Q Developer Agent for feature development (v20240430-dev), a coding assistant tool recently launched by AWS.

Results achieved after running the SWE-bench evaluation harness are below.

SWE-bench SWE-bench lite
% Resolved 13.82% 20.33%

This PR provides predictions, results, and logs for both the test (2294) and lite (300) subsets.

@john-b-yang
Copy link
Member

@timesler thanks so much for the submission, it looks great!

I'm currently a bit busy traveling right now so I haven't had a chance to look at it, but rest assured I'll get to it soon.

I'll check a few things out and have it merged by Tuesday this coming week.

@john-b-yang john-b-yang merged commit 75dc98d into SWE-bench:main May 14, 2024
@john-b-yang
Copy link
Member

@timesler Just pulled the branch and verified the results - congratulations on setting SOTA on the full and lite splits of SWE-bench! 🎉

I have merged the results into the repository. I will make some minor naming tweaks to the log files + add the results/ folder containing statistics about the submission.

I will make a follow up comment once the results have been propagated to https://www.swebench.com/!

@timesler
Copy link
Contributor Author

Fantastic, thank you!

john-b-yang added a commit that referenced this pull request Oct 15, 2024
…-dev

Submission for Amazon Q Developer Agent v20240430-dev
john-b-yang added a commit that referenced this pull request Oct 15, 2024
…-dev

Submission for Amazon Q Developer Agent v20240430-dev
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants