Skip to content

Add async agent evaluation support #2

@MukundaKatta

Description

@MukundaKatta

Summary

AgentBench run_evaluation currently assumes a synchronous agent function. That keeps the first version simple, but many real agent stacks are async because they call tools, remote APIs, browsers, or long-running workflows.

Why this matters

Without async support, developers end up wrapping modern agents in awkward sync shims or skipping the library entirely for real evaluations.

Proposed scope

  • support async agent callables alongside sync ones
  • add an async evaluation entry point, or transparently detect coroutine returns
  • preserve the existing EvalResult structure
  • keep compare_agents compatible with async evaluation flows
  • add tests covering both sync and async agents

Acceptance criteria

  • async agent functions can be benchmarked without custom wrappers
  • sync usage remains backward compatible
  • docs show one async evaluation example
  • test coverage includes async success and async exception handling

Notes

This is one of the highest-leverage upgrades for making AgentBench feel production-relevant instead of demo-only.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions