Summary
AgentBench run_evaluation currently assumes a synchronous agent function. That keeps the first version simple, but many real agent stacks are async because they call tools, remote APIs, browsers, or long-running workflows.
Why this matters
Without async support, developers end up wrapping modern agents in awkward sync shims or skipping the library entirely for real evaluations.
Proposed scope
- support async agent callables alongside sync ones
- add an async evaluation entry point, or transparently detect coroutine returns
- preserve the existing EvalResult structure
- keep compare_agents compatible with async evaluation flows
- add tests covering both sync and async agents
Acceptance criteria
- async agent functions can be benchmarked without custom wrappers
- sync usage remains backward compatible
- docs show one async evaluation example
- test coverage includes async success and async exception handling
Notes
This is one of the highest-leverage upgrades for making AgentBench feel production-relevant instead of demo-only.
Summary
AgentBench run_evaluation currently assumes a synchronous agent function. That keeps the first version simple, but many real agent stacks are async because they call tools, remote APIs, browsers, or long-running workflows.
Why this matters
Without async support, developers end up wrapping modern agents in awkward sync shims or skipping the library entirely for real evaluations.
Proposed scope
Acceptance criteria
Notes
This is one of the highest-leverage upgrades for making AgentBench feel production-relevant instead of demo-only.