Skip to content

fix(a2a): add configurable drain timeout in AgentTaskProcessor to guard against agent loop crash mid-turn #2329

@bug-ops

Description

@bug-ops

Context

PR #2328 fixed the response-shift bug (#2326) by replacing try_recv with a blocking recv().await-until-Flush drain loop in AgentTaskProcessor::process().

Problem

The drain loop has no timeout:

loop {
    match handle.output_rx.recv().await {
        Some(LoopbackEvent::Flush) | None => break,
        Some(_) => {}
    }
}

If the agent loop panics mid-turn and the sender is held by an Arc that outlives the panic (preventing None/channel-close), the drain loop blocks the current A2A request indefinitely.

Proposed fix

Add a drain_timeout_ms: u64 field to A2aConfig (default: 30_000ms) and wrap the drain loop with tokio::time::timeout. On timeout, log a warning and break — degraded behavior (possible stale event on next request) is better than a deadlock.

Priority

P3 — unlikely in practice given current agent lifecycle, but a latent correctness issue.

Metadata

Metadata

Assignees

Labels

P3Research — medium-high complexitybugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions