Skip to content

Stall monitor fires too aggressively during active tool execution #105

@nathanschram

Description

Description

The stall monitor fires warnings when a subprocess has been idle for 5 minutes (300s), but it doesn't account for actively running tools. When Claude Code is executing a long-running Bash command or Agent/TaskOutput, the subprocess is alive and working but not emitting new JSONL events — triggering false stall warnings.

Observed in the auditor-toolkit chat: 3 stall warnings in ~10 minutes while Claude was actively running pytest and other tools. The process was clearly alive (CPU active, TCP connections open) but the stall monitor treated it as stuck.

Impact

  • False positive Telegram notifications alarm users
  • Stall auto-cancel could kill healthy runs with long tool executions
  • Noise from untether-issue-watcher creating issues for non-bugs

Fix

Add a third threshold tier: when a tool action is started but not completed, use _STALL_THRESHOLD_TOOL (600s / 10 minutes) instead of the normal 300s threshold. The three-tier system:

  • Normal: 300s (5 min)
  • Running tool: 600s (10 min) — new
  • Pending approval: 1800s (30 min)

Affected files

  • src/untether/runner_bridge.py_stall_monitor(), _has_running_tool(), _STALL_THRESHOLD_TOOL

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions