[Bug]: Config-change-triggered restart fails via launchctl (ETIMEDOUT), gateway enters degraded state, memory indexes stuck at zero for hours

### Bug type

Behavior bug (incorrect output/state without crash)

### Summary

When a config change triggers an automatic gateway restart on macOS (LaunchAgent), the spawnSync launchctl call can time out. The fallback in-process restart also times out ("shutdown timed out; exiting without full cleanup"). The gateway continues running in a degraded state where memory reindex commands (openclaw memory index) complete successfully but indexes don't actually recover — leaving agent memory indexes at zero for hours until the process eventually self-heals.

### Steps to reproduce

1. Run OpenClaw gateway via macOS LaunchAgent (ai.openclaw.gateway)
2. Have multiple agents with large memory indexes (1000+ files across 5 agents)
3. Make a config change that requires gateway restart (e.g., ACP settings: acp.enabled, acp.defaultAgent, acp.stream)
4. Gateway detects config change, sends SIGUSR1, attempts spawnSync launchctl kickstart
5. Spawn times out → falls back to in-process restart → shutdown also times out

### Expected behavior

Gateway should cleanly restart. If the primary restart mechanism fails, the fallback should either: - Successfully complete the in-process restart, OR - Exit cleanly so LaunchAgent's KeepAlive restarts a fresh process

### Actual behavior

Gateway enters a degraded state where: - Telegram message routing continues working (agents can chat) - Memory indexer is stuck — openclaw memory index CLI reports success but indexes don't update - Agent memory indexes drop to 0/N or freeze at stale counts - Watchdog monitoring detects the issue but repeated reindex attempts don't fix it - Self-heals after ~12-19 hours (likely when LaunchAgent eventually restarts the process)

### OpenClaw version

2026.3.2

### Operating system

macOS (Darwin 25.3.0, arm64, M1 Pro)

### Install method

npm global

### Logs, screenshots, and evidence

```shell
2026-03-04T04:18:16.943Z [reload] config change requires gateway restart (acp)
2026-03-04T04:18:19.023Z [gateway] full process restart failed (spawnSync launchctl ETIMEDOUT); falling back to in-process restart
2026-03-04T04:18:20.010Z [reload] config change requires gateway restart (acp.defaultAgent)
2026-03-04T04:19:01.170Z [gateway] shutdown timed out; exiting without full cleanup
Timeline: - 04:18 — Failed restart. Gateway degraded. - 07:10 — First memory index gap detected (indexed frozen, total growing) - 11:07 — One agent's index drops to ZERO. Stays at zero despite reindex every 30 min. - 23:19 — Self-healed (~19 hours later)
```

### Impact and severity

**Affected:** Multi-agent deployments on macOS (LaunchAgent) with large memory indexes (1000+ files across 5+ agents)
**Severity:** High — one agent's memory index stuck at zero for 12 hours, blocking all memory-dependent functionality
**Frequency:** Triggered once during embedding provider migration (config change → failed restart). Likely reproducible whenever a config change requiring restart occurs during heavy indexing.
**Consequence:** Agent loses all memory search capability. External watchdog reindex cannot recover while gateway is in degraded state. Self-heals only after eventual clean process restart (~12-19 hours).

### Additional information

**Additional Environment Information:**
Gateway: LaunchAgent (ai.openclaw.gateway)
Memory provider: Ollama (nomic-embed-text, 768-dim)
Agents: 5 (main + 4 staff)
Total indexed files: ~1,300 across all agents

**Related Issues:**
#10600 — Desktop app triggers periodic SIGUSR1 restarts (same SIGUSR1 → restart path)
#24279 / #24301 — PID mismatch in restart health check (restart mechanism bugs)
#32048 — "shutdown timed out; exiting without full cleanup" (same error, different trigger)
#32277 — Memory indexer doesn't detect embedding dimension mismatch (related — we were mid-migration)
#11728 — runSafeReindex partial reindex after DB deletion (related reindex bug)
#3573 — openclaw status always reports memory as dirty (makes debugging harder)

### Workaround:
External memory watchdog (LaunchAgent, runs every 30 min) that detects index drops and runs openclaw memory index. Works for normal cases but cannot recover from the degraded gateway state described here. Adding logic to detect repeated failures and trigger a full process restart (via launchctl kickstart -k) would mitigate this.

### Suggested Fix:
1. The in-process restart fallback should force-exit the process (e.g., process.exit(1)) after shutdown timeout, rather than continuing in a degraded state. Let LaunchAgent/systemd handle the restart cleanly.
2. Consider adding a health check that detects "indexes stuck" state and triggers self-restart.





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Config-change-triggered restart fails via launchctl (ETIMEDOUT), gateway enters degraded state, memory indexes stuck at zero for hours #36822

Bug type

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Logs, screenshots, and evidence

Impact and severity

Additional information

Workaround:

Suggested Fix:

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Config-change-triggered restart fails via launchctl (ETIMEDOUT), gateway enters degraded state, memory indexes stuck at zero for hours #36822

Description

Bug type

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Logs, screenshots, and evidence

Impact and severity

Additional information

Workaround:

Suggested Fix:

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions