Session stall diagnostics: process alive but no stream-json events

## Problem

In Untether v0.34.0 production, Claude Code sessions freeze — the subprocess stays alive but stops producing stream-json stdout events on the pipe. The stall monitor detects it after 5 min but has no diagnostic info, making root cause analysis impossible from logs alone.

**Observed failure modes:**
1. **Triage**: 2 parallel Agent subagents → lost ALL TCP → 81% CPU, zero TCP, no stdout for 30+ min
2. **Auditor-toolkit**: Bash finished → Claude sleeping with 1 ESTABLISHED TCP → no stdout for 10+ min
3. **Triage (resumed)**: Same session resumed → immediately stalled again (tainted context)

**Current gaps:**
- Stall monitor logs only elapsed time — no process state, TCP, last action
- Subprocess watchdog only detects dead processes, not "alive but stalled"
- No auto-recovery mechanism
- No event timeline context for post-mortem analysis
- stderr captured but not accessible during stalls

## Solution

Rich diagnostics on every stall for post-mortem, progressive warnings, and safe auto-recovery:

1. **Process diagnostics module** (`proc_diag.py`) — `/proc/{pid}/` reads for CPU, memory, TCP, FDs, children
2. **Event tracking on JsonlStreamState** — timestamp, type, tool name, ring buffer of recent events (all engines)
3. **PID injection into StartedEvent meta** — base class handles all engines automatically
4. **Stderr ring buffer** — `stream.stderr_capture` accessible from watchdog and stall monitor
5. **Progressive stall monitor** — repeating warnings with fresh `/proc` diagnostics each time
6. **Liveness watchdog** — 10 min timeout for "alive but silent" with optional auto-kill (zero TCP + zero CPU)
7. **Session completion summary** — one-line log for post-mortem pattern analysis
8. **Watchdog config** — `[watchdog]` section: `liveness_timeout`, `stall_auto_kill`, `stall_repeat_seconds`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Session stall diagnostics: process alive but no stream-json events #97

Problem

Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Session stall diagnostics: process alive but no stream-json events #97

Description

Problem

Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions