Skip to content

Stall monitor doesn't detect stalls when no events arrive after StartedEvent #95

@nathanschram

Description

Problem

When a Claude Code Bash tool runs a long-lived command (e.g. wrangler tail which streams indefinitely), the entire session stalls because Claude is blocked waiting for the tool to return. The stall monitor in ProgressEdits fails to detect this and no notification reaches the user.

Observed behaviour

  • Triage chat ran for 41 minutes with no progress events (stuck at step 75)
  • User saw no stall notification in Telegram
  • No progress_edits.stall_detected in journal logs
  • Had to manually investigate and kill the stuck wrangler tail process tree

Root cause

Two issues in the stall monitor (runner_bridge.py):

  1. _last_event_at starts at 0.0 and early events are skipped (line 549: if self._last_event_at == 0: continue). If events stop arriving early in the session (e.g. only a few events before the Bash tool blocks), _last_event_at may be 0 or the stall monitor may never observe a non-zero timestamp, depending on timing.

  2. Stall detection only logs a journal warning — the user gets no Telegram notification. For sessions controlled via Telegram (the entire point of Untether), a journal-only warning is invisible to the user.

Expected behaviour

  1. Stall monitor should initialise _last_event_at from session start time (not 0.0), so stalls are detected even if no events have arrived yet
  2. After detecting a stall (5 min threshold), send a Telegram notification to the chat so the user knows the session appears stuck
  3. Consider: offer a "Kill stuck process" or "Cancel" button in the stall notification

Affected files

  • src/untether/runner_bridge.pyProgressEdits._stall_monitor(), _last_event_at initialisation

Reproduction

  1. Start a Claude session via Telegram with plan mode
  2. Have Claude run a long-lived Bash command (e.g. wrangler tail, tail -f, watch)
  3. Wait 5+ minutes — no stall notification appears in Telegram
  4. Check journal — no progress_edits.stall_detected entry either (if _last_event_at was still 0)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions