Skip to content

Stall monitor loops forever after laptop sleep — no auto-cancel, /cancel requires reply #99

@nathanschram

Description

Description

When a user starts a Claude task via Untether on a laptop, then closes the lid (sleep), the Claude subprocess is killed by the OS. When the laptop wakes, Untether's stall monitor resumes and fires repeated "No progress" warnings every 3 minutes indefinitely.

Impact

  • Zombie stall loop: 7+ stall warnings sent to Telegram with no auto-recovery
  • Dead process undetectable: All warnings show pid=None, process_alive=None, last_event_seq=0 — the monitor can't determine the process is dead
  • /cancel unusable: Sending /cancel as a standalone command (without replying to the progress message) is rejected with "reply to the progress message to cancel" — but on mobile, finding and replying to the right message is difficult

Root causes

  1. No stall cap: _stall_monitor() loops forever — no maximum warning count or dead-process detection to trigger auto-cancel
  2. Late PID threading: ProgressEdits.pid is only set when StartedEvent arrives via JSONL. If the subprocess dies before emitting any events (e.g. killed during OS sleep), pid remains None and collect_proc_diag() returns None
  3. /cancel requires reply: handle_cancel() always requires a reply to the progress message, with no fallback for single-active-run scenarios

Timeline (from logs)

  • 13:15 UTCsubprocess.spawn (pid=5468)
  • MacBook lid closed, subprocess killed by macOS
  • ~21:00 UTC — MacBook wakes, stall monitor resumes
  • 21:00–21:21 UTC — 7× "No progress" warnings, all with pid=None
  • User sends /cancel standalone → rejected
  • Stall warnings continue indefinitely

Fix

  • Stall auto-cancel: dead process detection, no-PID zombie cap (3 warnings), absolute cap (10 warnings)
  • Early PID threading from subprocess spawn (before StartedEvent)
  • /cancel fallback: cancel single active run without requiring reply
  • queued_for_chat() method on scheduler for standalone cancel of queued jobs

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions