[BUG] Claude Code sends SIGTERM to all healthy stdio MCP servers after 10-60s — root cause analysis with strace evidence

## Bug Description

Claude Code sends SIGTERM to **all** stdio-based MCP servers simultaneously, 10–60 seconds after successful connection and handshake. No errors precede the kill — servers are healthy and actively responding to tool calls. The timeout interval shrinks over the session lifetime (60s → 30s → 10s). The only recovery is manual `/mcp` reconnection, which itself gets killed again.

This is a systemic issue affecting every stdio MCP server configured in the session. Cloud-hosted MCPs (Gmail, Google Calendar via claude.ai) are unaffected because they use a different transport.

## Root Cause Analysis

I deployed three layers of instrumentation to trace the root cause:

### 1. strace on Claude Code's process tree

```
sudo strace -p <claude_pid> -e kill,tgkill -f -t
```

Captured:

```
1480540 21:57:45 kill(1501128, SIGINT)  = 0   # Main Claude PID kills one MCP
1501518 21:57:57 kill(1501129, SIGTERM) = 0   # Child wrapper kills another MCP
1480540 21:58:02 kill(1501627, SIGINT)  = 0   # Main Claude PID kills another
```

**PID 1501518** is a short-lived Claude child process (MCP lifecycle wrapper). It spawns around each MCP server, and deliberately sends SIGTERM to kill it.

### 2. Watchdog process monitor

A polling script that tracks MCP child processes by PID, logs when they appear/disappear:

```
[21:40:25] NEW: PID=1480580 (chrome-devtools-mcp) fd0=socket fd1=/dev/null
[21:40:25] NEW: PID=1480584 (typst-mcp) fd0=socket fd1=socket
[21:40:25] NEW: PID=1480766 (fli-mcp) fd0=socket fd1=socket
[21:40:25] NEW: PID=1480803 (mcp-stdio-proxy.sh) fd0=socket fd1=socket
[21:40:25] NEW: PID=1480840 (outlook-owa) fd0=socket fd1=socket

[21:41:05] GONE: PID=1480584 (typst-mcp) — exit code: 127
[21:41:05] GONE: PID=1480580 (chrome-devtools-mcp) — exit code: 127
[21:41:05] GONE: PID=1480766 (fli-mcp) — exit code: 127
[21:41:05] GONE: PID=1480803 (mcp-stdio-proxy.sh) — exit code: 127
[21:41:05] GONE: PID=1480840 (outlook-owa) — exit code: 127
```

All 5 MCP servers killed at the exact same second, 40 seconds after startup.

### 3. JSON-RPC stdio proxy

A transparent bidirectional proxy that logs all JSON-RPC messages between Claude Code and an MCP server:

```
[21:40:20] C->S: initialize request
[21:40:20] S->C: initialize response (success, 16 tools listed)
[21:40:20] C->S: notifications/initialized
[21:40:20] C->S: tools/list
[21:40:20] S->C: tools/list response (success)
[21:41:01] PROXY: SIGTERM received
[21:41:01] PROXY: Server died with signal TERM (143)
```

No errors, no failed requests, no compaction event. Clean SIGTERM 41 seconds after a successful handshake.

## Hypotheses Ruled Out

| Hypothesis | Evidence | Verdict |
|---|---|---|
| Context compaction | No PostCompact hook fired; happens too early in session | ❌ Eliminated |
| Individual MCP crashes | All 5 die simultaneously with same exit code | ❌ Eliminated |
| MCP server idle timeout | Called tools right after reconnect — still killed 10s later | ❌ Eliminated |
| Hooks killing MCPs | Audited all hooks in `~/.claude/hooks/` — none target MCPs | ❌ Eliminated |
| External process (cron/reaper) | Only systemd timer runs at 2am, only targets orphans (PPID=1) | ❌ Eliminated |

## Conclusion

**Claude Code has an internal stdio timeout/lifecycle mechanism that kills healthy MCP servers.** Evidence:
1. strace confirms CC spawns a wrapper process per MCP that sends SIGTERM
2. CC changelog mentions a fix for "MCP stdio server timeout not killing child process" — confirming this timeout exists by design
3. `MCP_TIMEOUT` env var exists to configure it, but the default appears too aggressive
4. The timeout fires even when MCPs are healthy and actively responding

## Impact

This effectively breaks the MCP extensibility model for power users. Anyone running multiple stdio MCPs (browser automation, email, calendars, databases, custom tools) loses their entire tool surface repeatedly throughout a session. The failure is **silent** — no error message, no warning. Tools simply stop being available.

Prior issues reporting this symptom were auto-closed as duplicates or for inactivity, but none identified the root cause:
- #15758 — MCP tools silently disappear mid-session (closed/locked)
- #24350 — MCP connections drop silently, require manual /mcp (closed as dup of #15758, locked)
- #38395 — GitHub MCP disconnects during multi-file operations
- #7718 — SIGABRT crash during MCP shutdown (SIGINT → SIGTERM → SIGKILL cascade)
- #35287 — Stdio MCPs hang indefinitely when init fails

## Reproduction

1. Configure 3+ stdio MCP servers in `~/.claude.json`
2. Start a Claude Code session
3. Verify MCPs connect via `/mcp`
4. Wait 10–60 seconds
5. All stdio MCPs will disconnect simultaneously

## Reproduction instrumentation

<details>
<summary><b>mcp-watchdog.sh</b> — polls child processes, detects when MCPs appear/disappear</summary>

```bash
#!/bin/bash
# Usage: mcp-watchdog.sh <claude_pid>
# Run in a separate terminal. Auto-stops when Claude exits.

LOG="$HOME/.claude/logs/mcp-disconnect-debug.log"
INTERVAL=10

CLAUDE_PID="${1:?Usage: mcp-watchdog.sh <claude_pid>}"
MCP_PATTERNS="chrome-devtools-mcp|typst-mcp|fli-mcp|google-tasks|outlook-owa/server|discord.*server"

log() { echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] [watchdog] $*" >> "$LOG"; }

declare -A PREV_PIDS
log "=== WATCHDOG STARTED for Claude PID=$CLAUDE_PID ==="

while kill -0 "$CLAUDE_PID" 2>/dev/null; do
    declare -A CURR_PIDS
    for pid in $(pgrep -P "$CLAUDE_PID" 2>/dev/null); do
        CMDLINE=$(cat /proc/$pid/cmdline 2>/dev/null | tr '\0' ' ' | head -c 200)
        [ -z "$CMDLINE" ] && continue
        echo "$CMDLINE" | grep -qE "$MCP_PATTERNS" || continue
        FD0=$(readlink /proc/$pid/fd/0 2>/dev/null || echo "GONE")
        FD1=$(readlink /proc/$pid/fd/1 2>/dev/null || echo "GONE")
        STATE="$CMDLINE|$FD0|$FD1"
        CURR_PIDS[$pid]="$STATE"
        if [ -z "${PREV_PIDS[$pid]}" ]; then
            SHORT=$(echo "$CMDLINE" | grep -oE '[^ ]*mcp[^ ]*|chrome-devtools|typst|google-tasks|outlook|discord' | head -1)
            log "  NEW: PID=$pid ($SHORT) fd0=$FD0 fd1=$FD1"
        fi
    done
    for pid in "${!PREV_PIDS[@]}"; do
        if [ -z "${CURR_PIDS[$pid]}" ]; then
            SHORT=$(echo "${PREV_PIDS[$pid]}" | grep -oE '[^ ]*mcp[^ ]*|chrome-devtools|typst|google-tasks|outlook|discord' | head -1)
            log "  GONE: PID=$pid ($SHORT) — process disappeared!"
        fi
    done
    unset PREV_PIDS; declare -A PREV_PIDS
    for pid in "${!CURR_PIDS[@]}"; do PREV_PIDS[$pid]="${CURR_PIDS[$pid]}"; done
    unset CURR_PIDS
    sleep "$INTERVAL"
done
log "=== WATCHDOG STOPPED ==="
```

</details>

<details>
<summary><b>mcp-stdio-proxy.sh</b> — logs all bidirectional JSON-RPC traffic between CC and an MCP server</summary>

```bash
#!/bin/bash
# Usage: mcp-stdio-proxy.sh <logfile> <command> [args...]
# Configure in ~/.claude.json as the MCP command, wrapping the real server.

LOGFILE="${1:?Usage: mcp-stdio-proxy.sh <logfile> <command> [args...]}"
shift; COMMAND="${1:?Missing command}"; shift

log() { echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] $1: $2" >> "$LOGFILE"; }
log "PROXY" "=== PROXY STARTED (PID=$$, PPID=$PPID) ==="
log "PROXY" "Command: $COMMAND $*"

TMPDIR=$(mktemp -d /tmp/mcp-proxy-XXXXXX)
C2S="$TMPDIR/c2s"; S2C="$TMPDIR/s2c"
mkfifo "$C2S" "$S2C"

cleanup() {
    log "PROXY" "=== CLEANUP (signal=${1:-EXIT}) ==="
    if kill -0 "$SERVER_PID" 2>/dev/null; then
        log "PROXY" "Server still alive at cleanup"
    else
        wait "$SERVER_PID" 2>/dev/null; ec=$?
        [ "$ec" -gt 128 ] && log "PROXY" "Server died with signal $((ec-128)) ($(kill -l $((ec-128)) 2>/dev/null))"
        [ "$ec" -le 128 ] && [ "$ec" -ne 0 ] && log "PROXY" "Server exited with code $ec"
    fi
    kill "$C2S_PID" "$S2C_PID" "$SERVER_PID" 2>/dev/null
    rm -rf "$TMPDIR"
}
trap 'cleanup TERM' TERM; trap 'cleanup INT' INT

exec 3<&0; exec 4>&1
"$COMMAND" "$@" < "$C2S" > "$S2C" 2>> "$LOGFILE" &
SERVER_PID=$!

( while IFS= read -r line <&3; do log "C->S" "$line"; echo "$line"; done > "$C2S" ) &
C2S_PID=$!
( while IFS= read -r line; do log "S->C" "$line"; echo "$line" >&4; done < "$S2C" ) &
S2C_PID=$!

wait "$SERVER_PID" 2>/dev/null; SERVER_EXIT=$?
cleanup "server-exit"; exit "$SERVER_EXIT"
```

</details>

## Expected Behavior

- Stdio MCP servers should remain connected for the lifetime of the session unless they crash or the user disconnects them
- If a timeout exists by design, it should only fire when the MCP server is genuinely unresponsive (not responding to ping/heartbeat), not on a wall-clock timer
- `MCP_TIMEOUT` default should be documented

## Environment

- **Platform:** Ubuntu Linux (x86_64)
- **Claude Code:** 2.1.86
- **MCP servers tested:** chrome-devtools-mcp, outlook-owa, google-tasks, typst-mcp, flights-mcp (all stdio)
- **Not affected:** Gmail, Google Calendar (cloud-hosted, different transport)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Claude Code sends SIGTERM to all healthy stdio MCP servers after 10-60s — root cause analysis with strace evidence #40207

Bug Description

Root Cause Analysis

1. strace on Claude Code's process tree

2. Watchdog process monitor

3. JSON-RPC stdio proxy

Hypotheses Ruled Out

Conclusion

Impact

Reproduction

Reproduction instrumentation

Expected Behavior

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Hypothesis	Evidence	Verdict
Context compaction	No PostCompact hook fired; happens too early in session	❌ Eliminated
Individual MCP crashes	All 5 die simultaneously with same exit code	❌ Eliminated
MCP server idle timeout	Called tools right after reconnect — still killed 10s later	❌ Eliminated
Hooks killing MCPs	Audited all hooks in `~/.claude/hooks/` — none target MCPs	❌ Eliminated
External process (cron/reaper)	Only systemd timer runs at 2am, only targets orphans (PPID=1)	❌ Eliminated

[BUG] Claude Code sends SIGTERM to all healthy stdio MCP servers after 10-60s — root cause analysis with strace evidence #40207

Description

Bug Description

Root Cause Analysis

1. strace on Claude Code's process tree

2. Watchdog process monitor

3. JSON-RPC stdio proxy

Hypotheses Ruled Out

Conclusion

Impact

Reproduction

Reproduction instrumentation

Expected Behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions