Skip to content

ExitPlanMode stuck after cancel + resume — stale outline_guard #93

@nathanschram

Description

Bug

When a user clicks "Pause & Outline Plan" on an ExitPlanMode control request and then cancels the run (via /cancel or cancel button), the global registries _OUTLINE_PENDING, _DISCUSS_COOLDOWN, and _DISCUSS_APPROVED are not cleaned up. On resume, Claude's next ExitPlanMode call is permanently blocked by a stale outline_guard.

Production incident

Session fbce514b on @hetz_lba1_bot (v0.33.4), souliv.com.au chat route:

Time Event
14:32:06 ExitPlanMode control request shown
14:32:24 User clicks "Pause & Outline Plan" → set_discuss_cooldown()
14:32:25 cancel.requested — run cancelled
14:32:30 handle.cancelled — subprocess killed, finally runs → FDs closed but registries NOT cleared
14:32:38 User types "Approved" → new run with --resume (same session_id)
14:33:15 Claude calls ExitPlanMode → outline_guard=True (stale!) → auto-denied
14:35:33+ User clicks synthetic "Approve Plan" buttons — toasts shown but Claude already stopped trying

Root cause

Registry cleanup only exists in process_error_events() and stream_end_events(), which are NOT called on cancellation. The run_impl finally block only closes file descriptors.

Fix

Extract _cleanup_session_registries() helper, call from run_impl finally block (covers cancel path). Refactor process_error_events and stream_end_events to use the same helper. All operations are idempotent so double-cleanup on the normal path is safe.

Affected files

  • src/untether/runners/claude.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingengine:claudeClaude Code CLI (Anthropic)

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions