Skip to content

[BUG] Claude Code sends SIGTERM to all healthy stdio MCP servers after 10-60s — root cause analysis with strace evidence #40207

@ignaciomella

Description

@ignaciomella

Bug Description

Claude Code sends SIGTERM to all stdio-based MCP servers simultaneously, 10–60 seconds after successful connection and handshake. No errors precede the kill — servers are healthy and actively responding to tool calls. The timeout interval shrinks over the session lifetime (60s → 30s → 10s). The only recovery is manual /mcp reconnection, which itself gets killed again.

This is a systemic issue affecting every stdio MCP server configured in the session. Cloud-hosted MCPs (Gmail, Google Calendar via claude.ai) are unaffected because they use a different transport.

Root Cause Analysis

I deployed three layers of instrumentation to trace the root cause:

1. strace on Claude Code's process tree

sudo strace -p <claude_pid> -e kill,tgkill -f -t

Captured:

1480540 21:57:45 kill(1501128, SIGINT)  = 0   # Main Claude PID kills one MCP
1501518 21:57:57 kill(1501129, SIGTERM) = 0   # Child wrapper kills another MCP
1480540 21:58:02 kill(1501627, SIGINT)  = 0   # Main Claude PID kills another

PID 1501518 is a short-lived Claude child process (MCP lifecycle wrapper). It spawns around each MCP server, and deliberately sends SIGTERM to kill it.

2. Watchdog process monitor

A polling script that tracks MCP child processes by PID, logs when they appear/disappear:

[21:40:25] NEW: PID=1480580 (chrome-devtools-mcp) fd0=socket fd1=/dev/null
[21:40:25] NEW: PID=1480584 (typst-mcp) fd0=socket fd1=socket
[21:40:25] NEW: PID=1480766 (fli-mcp) fd0=socket fd1=socket
[21:40:25] NEW: PID=1480803 (mcp-stdio-proxy.sh) fd0=socket fd1=socket
[21:40:25] NEW: PID=1480840 (outlook-owa) fd0=socket fd1=socket

[21:41:05] GONE: PID=1480584 (typst-mcp) — exit code: 127
[21:41:05] GONE: PID=1480580 (chrome-devtools-mcp) — exit code: 127
[21:41:05] GONE: PID=1480766 (fli-mcp) — exit code: 127
[21:41:05] GONE: PID=1480803 (mcp-stdio-proxy.sh) — exit code: 127
[21:41:05] GONE: PID=1480840 (outlook-owa) — exit code: 127

All 5 MCP servers killed at the exact same second, 40 seconds after startup.

3. JSON-RPC stdio proxy

A transparent bidirectional proxy that logs all JSON-RPC messages between Claude Code and an MCP server:

[21:40:20] C->S: initialize request
[21:40:20] S->C: initialize response (success, 16 tools listed)
[21:40:20] C->S: notifications/initialized
[21:40:20] C->S: tools/list
[21:40:20] S->C: tools/list response (success)
[21:41:01] PROXY: SIGTERM received
[21:41:01] PROXY: Server died with signal TERM (143)

No errors, no failed requests, no compaction event. Clean SIGTERM 41 seconds after a successful handshake.

Hypotheses Ruled Out

Hypothesis Evidence Verdict
Context compaction No PostCompact hook fired; happens too early in session ❌ Eliminated
Individual MCP crashes All 5 die simultaneously with same exit code ❌ Eliminated
MCP server idle timeout Called tools right after reconnect — still killed 10s later ❌ Eliminated
Hooks killing MCPs Audited all hooks in ~/.claude/hooks/ — none target MCPs ❌ Eliminated
External process (cron/reaper) Only systemd timer runs at 2am, only targets orphans (PPID=1) ❌ Eliminated

Conclusion

Claude Code has an internal stdio timeout/lifecycle mechanism that kills healthy MCP servers. Evidence:

  1. strace confirms CC spawns a wrapper process per MCP that sends SIGTERM
  2. CC changelog mentions a fix for "MCP stdio server timeout not killing child process" — confirming this timeout exists by design
  3. MCP_TIMEOUT env var exists to configure it, but the default appears too aggressive
  4. The timeout fires even when MCPs are healthy and actively responding

Impact

This effectively breaks the MCP extensibility model for power users. Anyone running multiple stdio MCPs (browser automation, email, calendars, databases, custom tools) loses their entire tool surface repeatedly throughout a session. The failure is silent — no error message, no warning. Tools simply stop being available.

Prior issues reporting this symptom were auto-closed as duplicates or for inactivity, but none identified the root cause:

Reproduction

  1. Configure 3+ stdio MCP servers in ~/.claude.json
  2. Start a Claude Code session
  3. Verify MCPs connect via /mcp
  4. Wait 10–60 seconds
  5. All stdio MCPs will disconnect simultaneously

Reproduction instrumentation

mcp-watchdog.sh — polls child processes, detects when MCPs appear/disappear
#!/bin/bash
# Usage: mcp-watchdog.sh <claude_pid>
# Run in a separate terminal. Auto-stops when Claude exits.

LOG="$HOME/.claude/logs/mcp-disconnect-debug.log"
INTERVAL=10

CLAUDE_PID="${1:?Usage: mcp-watchdog.sh <claude_pid>}"
MCP_PATTERNS="chrome-devtools-mcp|typst-mcp|fli-mcp|google-tasks|outlook-owa/server|discord.*server"

log() { echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] [watchdog] $*" >> "$LOG"; }

declare -A PREV_PIDS
log "=== WATCHDOG STARTED for Claude PID=$CLAUDE_PID ==="

while kill -0 "$CLAUDE_PID" 2>/dev/null; do
    declare -A CURR_PIDS
    for pid in $(pgrep -P "$CLAUDE_PID" 2>/dev/null); do
        CMDLINE=$(cat /proc/$pid/cmdline 2>/dev/null | tr '\0' ' ' | head -c 200)
        [ -z "$CMDLINE" ] && continue
        echo "$CMDLINE" | grep -qE "$MCP_PATTERNS" || continue
        FD0=$(readlink /proc/$pid/fd/0 2>/dev/null || echo "GONE")
        FD1=$(readlink /proc/$pid/fd/1 2>/dev/null || echo "GONE")
        STATE="$CMDLINE|$FD0|$FD1"
        CURR_PIDS[$pid]="$STATE"
        if [ -z "${PREV_PIDS[$pid]}" ]; then
            SHORT=$(echo "$CMDLINE" | grep -oE '[^ ]*mcp[^ ]*|chrome-devtools|typst|google-tasks|outlook|discord' | head -1)
            log "  NEW: PID=$pid ($SHORT) fd0=$FD0 fd1=$FD1"
        fi
    done
    for pid in "${!PREV_PIDS[@]}"; do
        if [ -z "${CURR_PIDS[$pid]}" ]; then
            SHORT=$(echo "${PREV_PIDS[$pid]}" | grep -oE '[^ ]*mcp[^ ]*|chrome-devtools|typst|google-tasks|outlook|discord' | head -1)
            log "  GONE: PID=$pid ($SHORT) — process disappeared!"
        fi
    done
    unset PREV_PIDS; declare -A PREV_PIDS
    for pid in "${!CURR_PIDS[@]}"; do PREV_PIDS[$pid]="${CURR_PIDS[$pid]}"; done
    unset CURR_PIDS
    sleep "$INTERVAL"
done
log "=== WATCHDOG STOPPED ==="
mcp-stdio-proxy.sh — logs all bidirectional JSON-RPC traffic between CC and an MCP server
#!/bin/bash
# Usage: mcp-stdio-proxy.sh <logfile> <command> [args...]
# Configure in ~/.claude.json as the MCP command, wrapping the real server.

LOGFILE="${1:?Usage: mcp-stdio-proxy.sh <logfile> <command> [args...]}"
shift; COMMAND="${1:?Missing command}"; shift

log() { echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] $1: $2" >> "$LOGFILE"; }
log "PROXY" "=== PROXY STARTED (PID=$$, PPID=$PPID) ==="
log "PROXY" "Command: $COMMAND $*"

TMPDIR=$(mktemp -d /tmp/mcp-proxy-XXXXXX)
C2S="$TMPDIR/c2s"; S2C="$TMPDIR/s2c"
mkfifo "$C2S" "$S2C"

cleanup() {
    log "PROXY" "=== CLEANUP (signal=${1:-EXIT}) ==="
    if kill -0 "$SERVER_PID" 2>/dev/null; then
        log "PROXY" "Server still alive at cleanup"
    else
        wait "$SERVER_PID" 2>/dev/null; ec=$?
        [ "$ec" -gt 128 ] && log "PROXY" "Server died with signal $((ec-128)) ($(kill -l $((ec-128)) 2>/dev/null))"
        [ "$ec" -le 128 ] && [ "$ec" -ne 0 ] && log "PROXY" "Server exited with code $ec"
    fi
    kill "$C2S_PID" "$S2C_PID" "$SERVER_PID" 2>/dev/null
    rm -rf "$TMPDIR"
}
trap 'cleanup TERM' TERM; trap 'cleanup INT' INT

exec 3<&0; exec 4>&1
"$COMMAND" "$@" < "$C2S" > "$S2C" 2>> "$LOGFILE" &
SERVER_PID=$!

( while IFS= read -r line <&3; do log "C->S" "$line"; echo "$line"; done > "$C2S" ) &
C2S_PID=$!
( while IFS= read -r line; do log "S->C" "$line"; echo "$line" >&4; done < "$S2C" ) &
S2C_PID=$!

wait "$SERVER_PID" 2>/dev/null; SERVER_EXIT=$?
cleanup "server-exit"; exit "$SERVER_EXIT"

Expected Behavior

  • Stdio MCP servers should remain connected for the lifetime of the session unless they crash or the user disconnects them
  • If a timeout exists by design, it should only fire when the MCP server is genuinely unresponsive (not responding to ping/heartbeat), not on a wall-clock timer
  • MCP_TIMEOUT default should be documented

Environment

  • Platform: Ubuntu Linux (x86_64)
  • Claude Code: 2.1.86
  • MCP servers tested: chrome-devtools-mcp, outlook-owa, google-tasks, typst-mcp, flights-mcp (all stdio)
  • Not affected: Gmail, Google Calendar (cloud-hosted, different transport)

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:mcpbugSomething isn't workinghas reproHas detailed reproduction stepsplatform:linuxIssue specifically occurs on Linux

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions