Skip to content

fix: KillMode=process causes orphaned MCP server accumulation #88

@nathanschram

Description

Description

KillMode=process in untether.service (and untether-dev.service) causes orphaned MCP server processes to accumulate across service restarts, eventually consuming all available memory.

Impact

  • 10.3 GB RSS consumed by 86 orphaned node/MCP processes on lba-1
  • 612 total processes (vs ~400 after cleanup)
  • Telegram progress updates feel "very slow" due to memory pressure and I/O contention
  • Gets progressively worse with each pipx upgrade && systemctl restart cycle
  • Restarting the service makes it worse, not better (orphans more processes)

Root Cause

Each Claude Code session spawns ~14 MCP server processes (brave-search, context7, apify, jina, github, telegram, trello, pal, etc.). With KillMode=process, systemd only kills the main untether Python process on restart — all child processes (Claude CLI + MCP servers) survive and get reparented to systemd --user.

On lba-1 with 64 production restarts in one day (development iteration on v0.30–v0.33), this accumulated 86+ orphaned MCP server processes consuming 10.3 GB.

Fix

  1. Changed KillMode=processKillMode=control-group in both service files
    • systemd now kills the entire cgroup (main process + all children) on stop
    • The existing TimeoutStopSec=150 gives the graceful drain mechanism 2.5 minutes to finish active runs before SIGKILL
  2. Killed 78 orphaned MCP processes, recovering 8.5 GB of memory
  3. Added render debouncing (min_render_interval) and non-blocking approval notifications as performance improvements

Affected files

  • ~/.config/systemd/user/untether.service — local service file (not in repo)
  • ~/.config/systemd/user/untether-dev.service — local service file (not in repo)
  • src/untether/settings.pymin_render_interval, group_chat_rps config fields
  • src/untether/runner_bridge.py — debounce + non-blocking notifications
  • src/untether/telegram/backend.py — config wiring

Verification

After cleanup:

  • Processes: 612 → 401
  • Memory: 16 GB → 7.0 GB
  • MCP/node remaining: 8 (legitimate, under active claude sessions)
  • All 6 engines tested via @untether_dev_bot with 0 errors

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions