-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Bug
When a user clicks "Pause & Outline Plan" on an ExitPlanMode control request and then cancels the run (via /cancel or cancel button), the global registries _OUTLINE_PENDING, _DISCUSS_COOLDOWN, and _DISCUSS_APPROVED are not cleaned up. On resume, Claude's next ExitPlanMode call is permanently blocked by a stale outline_guard.
Production incident
Session fbce514b on @hetz_lba1_bot (v0.33.4), souliv.com.au chat route:
| Time | Event |
|---|---|
| 14:32:06 | ExitPlanMode control request shown |
| 14:32:24 | User clicks "Pause & Outline Plan" → set_discuss_cooldown() |
| 14:32:25 | cancel.requested — run cancelled |
| 14:32:30 | handle.cancelled — subprocess killed, finally runs → FDs closed but registries NOT cleared |
| 14:32:38 | User types "Approved" → new run with --resume (same session_id) |
| 14:33:15 | Claude calls ExitPlanMode → outline_guard=True (stale!) → auto-denied |
| 14:35:33+ | User clicks synthetic "Approve Plan" buttons — toasts shown but Claude already stopped trying |
Root cause
Registry cleanup only exists in process_error_events() and stream_end_events(), which are NOT called on cancellation. The run_impl finally block only closes file descriptors.
Fix
Extract _cleanup_session_registries() helper, call from run_impl finally block (covers cancel path). Refactor process_error_events and stream_end_events to use the same helper. All operations are idempotent so double-cleanup on the normal path is safe.
Affected files
src/untether/runners/claude.py