Skip to content

Subagent timeout status race condition: workers show 'timed out' when successfully completed #53106

@FiredMosquito831

Description

@FiredMosquito831

Bug Description

Subagent workers can show status: timed out in completion events even when they completed successfully with valid results. This is a race condition in the runtime outcome tracking.

Evidence

When running a subagent test suite with 5 parallel workers, workers 4 and 5 showed:

{
  "status": "timed out",
  "runtime": "1s",
  "tokens": { "in": 0, "out": 0 }
}

But the actual result content showed successful completion:

"Worker 4 executed successfully ✅"
"Worker 5 executed successfully ✅"
Session keys confirmed, full execution details present

Key indicator: Token count shows 0 despite having actual result content — proves the stats/status are out of sync with the result.

Root Cause Analysis

Per the documentation:

"Status is not inferred from model output; it comes from runtime outcome signals."

The status is determined by the runtime, not by parsing output. The race condition occurs between:

  1. Worker completing successfully
  2. Timeout timer firing
  3. Status being recorded
Timeline:
T25: Worker completes task successfully
T26: Success result is being written to transcript
T30: Timeout timer fires ← RACE CONDITION
T31: Status set to "timeout" (overwrites or races with success)
T32: Completion event sent with status="timeout" but result="success"

The timeout mechanism uses a separate timer/async task. When it fires, it aborts the run and sets status to "timeout". If the worker completed just before the timeout, the success result is already in the transcript, but the status field gets set incorrectly.

Steps to Reproduce

  1. Spawn multiple subagents in parallel (5+ workers)
  2. Some workers will complete very quickly (< 5 seconds)
  3. Some completion events will show status: "timed out" with 0 tokens
  4. But the actual result content shows successful execution

Not easily reproducible on demand — it's a race condition that depends on timing.

Expected Behavior

  • Status should be "completed successfully" when the worker finishes before timeout
  • Token counts should reflect actual usage
  • Stats and status should be consistent with result content

Actual Behavior

  • Status shows "timed out" even when result content shows success
  • Token counts show 0 despite having result content
  • Stats and status are inconsistent with result

Impact

Area Impact
Functionality Low — Results are still delivered correctly
Observability High — Status reporting is unreliable
Monitoring High — Can't trust timeout alerts
Debugging Medium — False positives make debugging harder

Suggested Fixes

Option 1: Atomic Status Update

// Before setting timeout status, check if already completed
if (run.status === 'running') {
  run.status = 'timeout';
}

Option 2: Completion Timestamp Check

// If completion happened before timeout, use success status
if (run.completedAt && run.completedAt < timeoutFiredAt) {
  status = 'success';
} else {
  status = 'timeout';
}

Option 3: Lock-Based Synchronization

// Use a lock/mutex when updating status
await run.lock.acquire();
try {
  if (run.status === 'running') {
    run.status = 'timeout';
  }
} finally {
  run.lock.release();
}

Environment

  • OpenClaw version: 2026.3.22
  • Node.js: v22.22.0
  • Platform: Linux (WSL2)
  • Model provider: Alibaba Model Studio (via Anthropic-compatible endpoint)

Related

  • Investigation document: docs/research/subagent-deep-root-cause-analysis.md (in workspace)
  • Test results: docs/testing/subagent-bug-investigation.md (in workspace)

Additional Context

This was discovered during a comprehensive subagent test suite. The race condition appears to be intermittent — it affected workers 4 and 5 in a batch of 5 parallel workers, while workers 1-3 showed correct "completed successfully" status.

The issue is in the runtime outcome tracking mechanism, not in model output parsing. The result content is correctly captured, but the status field is set incorrectly due to the race.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions