terminal-bench
View and compare agent performance across different Terminal-Bench versions.
The latest version of Terminal-Bench. Submissions must use [email protected] via Harbor.
Legacy version of Terminal-Bench. Submissions must use terminal-bench-core==0.1.1.