Skip to content

[Bug]: macOS launchd: gateway restart can leave stale parent process and runtime/RPC state inconsistent #39074

@smile-xuc

Description

@smile-xuc

Bug type

Regression (worked before, now fails)

Summary

On macOS, openclaw gateway can get into an inconsistent state after network disruption and LaunchAgent restarts.

openclaw gateway status reports Runtime: running, but RPC probe: failed, and the actual process listening on the gateway port is a different PID than the one reported as the runtime PID.

This appears to be a service restart / shutdown cleanup issue under launchd.

Steps to reproduce

  1. Install and run OpenClaw as a macOS LaunchAgent-managed gateway.
  2. Confirm the healthy baseline:
    • openclaw gateway status
    • expected: Runtime: running and RPC probe: ok
  3. Trigger a network disruption or route/interface switch while the gateway is active.
    • In my case this happened when the machine’s network path changed after disconnecting an external display / dock.
  4. Restart the gateway service:
    • openclaw gateway restart
  5. Check gateway state again:
    • openclaw gateway status
    • lsof -nP -iTCP:18789
  6. Observe that:
    • gateway status may show Runtime: running
    • RPC probe may fail
    • the PID reported by the service runtime may differ from the PID actually listening on the gateway port
  7. In this state, replies may fail or the gateway may require another restart / manual cleanup before becoming healthy again.

Expected behavior

After a network disruption and a service restart, the gateway should recover to a single healthy state automatically.

Specifically:

  • openclaw gateway status should report:
    • Runtime: running
    • RPC probe: ok
  • the LaunchAgent runtime PID should match the actual process listening on the gateway port
  • no stale parent/child gateway processes should remain
  • the gateway should resume handling requests without requiring manual cleanup or repeated restarts

Actual behavior

After the network disruption and restart, the gateway could enter an inconsistent state where:

  • openclaw gateway status reported Runtime: running
  • RPC probe failed
  • the PID reported by the LaunchAgent runtime differed from the PID actually listening on 127.0.0.1:18789
  • replies could fail or stall until the service was manually cleaned up and restarted again

I also observed status output like:

RPC probe: failed
gateway closed (1006 abnormal closure (no close frame))
Port 18789 is already in use.
Another process is listening on this port.

### OpenClaw version

OpenClaw: `2026.3.2`

### Operating system

macOS 26.3

### Install method

Node: `/opt/homebrew/Cellar/node/25.6.1/bin/node`

### Logs, screenshots, and evidence

```shell

Impact and severity

No response

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingregressionBehavior that previously worked and now fails

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions