Skip to content

fix(openclaw): fix gateway status detection and improve error reporting#13433

Merged
kangfenmao merged 15 commits intomainfrom
DeJeune/fix-checkportopen
Mar 14, 2026
Merged

fix(openclaw): fix gateway status detection and improve error reporting#13433
kangfenmao merged 15 commits intomainfrom
DeJeune/fix-checkportopen

Conversation

@DeJeune
Copy link
Copy Markdown
Collaborator

@DeJeune DeJeune commented Mar 13, 2026

What this PR does

Before this PR:

  • getStatus only probed health when status was stopped or error. If the gateway crashed while status was running, the UI would never detect the crash and keep showing "运行中" (running).
  • checkHealth probed the gateway but never updated gatewayStatus, so a failed health check had no lasting effect.
  • startAndWaitForGateway spawned the process with stdio: 'ignore', discarding stderr. On startup failure, the user only saw "gateway exited with code 1" with no actionable detail.
  • parseUpdateStatus matched any update channel (npm, pkg, binary), causing false positives for binary-installed users when only an npm/pkg update was available.

After this PR:

  • getStatus probes health in all non-starting states, transitioning running → stopped when the gateway is unreachable.
  • checkHealth syncs gatewayStatus to stopped when the probe returns unhealthy.
  • startAndWaitForGateway pipes stderr and extracts the first 3 lines of error output for meaningful error messages.
  • parseUpdateStatus only matches binary-channel updates, ignoring npm/pkg channels.
  • 29 new state-machine tests cover all gateway status transitions and lifecycle scenarios.

Fixes #

Why we need it and why it was done in this way

The gateway status was a one-way latch: once set to running, it could only be changed by an explicit stopGateway call. External crashes (process killed, config errors, port conflicts) left the UI in an incorrect state.

The following tradeoffs were made:

  • getStatus now always calls probeGatewayHealth (except during starting). This adds a small overhead per status poll, but is necessary for correctness since the gateway process lifecycle is independent (detached).
  • stderr capture uses ['ignore', 'ignore', 'pipe'] instead of full 'pipe' to minimize resource usage — we only need stderr for error diagnostics.

The following alternatives were considered:

  • Keeping a persistent health-check interval timer — rejected as over-engineered for the current polling-based UI.
  • Parsing the full openclaw update status table structure — rejected in favor of simple regex matching on the binary channel keyword.

Breaking changes

None.

Special notes for your reviewer

  • The parseUpdateStatus change means users who installed OpenClaw via npm will no longer see update notifications in Cherry Studio. This is intentional — Cherry Studio only manages binary installations.
  • The test file grew significantly (from 90 lines to 595 lines) because this is the first time the OpenClawService class has state-machine test coverage.

Checklist

Release note

fix(openclaw): gateway status now correctly detects crashed/externally-stopped gateways; startup errors show detailed messages instead of generic exit codes; update checker only reports binary-channel updates

- getStatus now probes health in all states (not just stopped/error),
  detecting crashed gateways that still show as 'running'
- checkHealth syncs gatewayStatus to 'stopped' when probe returns unhealthy
- startAndWaitForGateway captures stderr (via pipe) to surface meaningful
  error messages instead of generic "gateway exited with code 1"
- parseUpdateStatus only matches binary-channel updates, ignoring npm/pkg
  channels that require a different upgrade path
- Add 29 state-machine tests covering getStatus, checkHealth, startGateway,
  stopGateway, restartGateway, and full lifecycle transitions

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: suyao <[email protected]>
@DeJeune DeJeune force-pushed the DeJeune/fix-checkportopen branch from a12224e to af22702 Compare March 13, 2026 06:58
@DeJeune DeJeune marked this pull request as draft March 13, 2026 09:12
@DeJeune DeJeune marked this pull request as ready for review March 13, 2026 10:13
@kangfenmao
Copy link
Copy Markdown
Collaborator

image

…erflow

Add max-h-25, overflow-y-auto, and break-all to the error message span
so long error texts (stack traces, config errors) don't blow out the
layout vertically or horizontally.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: suyao <[email protected]>
@DeJeune
Copy link
Copy Markdown
Collaborator Author

DeJeune commented Mar 13, 2026

image

fixed

@kangfenmao
Copy link
Copy Markdown
Collaborator

kangfenmao commented Mar 13, 2026

Note

This issue/comment/review was translated by Claude.

  1. Environment variable configuration
  2. Use openclaw.json as configuration file
  3. Process state management

Original Content
  1. 环境变量配置
  2. 使用 openclaw.json 作为配置文件
  3. 进程状态管理

The gateway is not registered as a system service, so restart is not
applicable. Remove the restart method, IPC channel, UI button, and
related tests.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: suyao <[email protected]>
@DeJeune DeJeune requested a review from 0xfullex as a code owner March 13, 2026 10:53
DeJeune and others added 5 commits March 13, 2026 19:04
…herry config

Migrate from openclaw.cherry.json back to openclaw.json so that the
CLI can also read the config without needing the OPENCLAW_CONFIG_PATH
env var override.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: suyao <[email protected]>
Add migration logic to handle existing users who have the old
openclaw.cherry.json config file. Renames it to openclaw.json if
no default config exists, otherwise removes the legacy file.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: suyao <[email protected]>
…nstall

On macOS/Linux, symlink the openclaw binary to /usr/local/bin so it is
accessible from the command line. On Windows, add the bin directory to
the user PATH via the registry. Both are cleaned up on uninstall.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: suyao <[email protected]>
…SS class order

When both openclaw.json and openclaw.cherry.json exist, back up the
default config and use the cherry config since it contains the user's
Cherry Studio settings. Also fix Biome CSS class sort order.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: suyao <[email protected]>
Copy link
Copy Markdown
Collaborator

@kangfenmao kangfenmao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

This review was translated by Claude.

Review Summary

✅ Approved

Core Improvements:

  1. Gateway Status Detection - Changed from Socket detection to HTTP request, more reliable
  2. Health Check Sync - Correctly updates gatewayStatus on failure
  3. Error Messages - Captures stderr to display detailed errors, no longer just code 1
  4. Update Detection - Only matches binary channels to avoid false positives
  5. Configuration Migration - Automatically handles openclaw.cherry.json → openclaw.json
  6. CLI Installation - Supports macOS/Linux symlinks and Windows PATH

Test Coverage:

  • ✅ Added 29 state machine tests
  • ✅ Covers complete lifecycle

Code Quality:

  • Clear commit messages
  • Comprehensive backward compatibility handling
  • Biome CSS class sorting fixes

**LGTM! 👍


Original Content

审查总结

✅ 批准合并

核心改进:

  1. 网关状态检测 - 从 Socket 探测改为 HTTP 请求,更可靠
  2. 健康检查同步 - 失败时正确更新 gatewayStatus
  3. 错误信息 - 捕获 stderr 显示详细错误,不再是 code 1
  4. 更新检测 - 只匹配 binary 通道,避免误报
  5. 配置迁移 - 自动处理 openclaw.cherry.json → openclaw.json
  6. CLI 安装 - 支持 macOS/Linux 软链和 Windows PATH

测试覆盖:

  • ✅ 新增 29 个状态机测试
  • ✅ 覆盖完整生命周期

代码质量:

  • 清晰的提交信息
  • 完善的向后兼容处理
  • Biome CSS 类排序修复

**LGTM! 👍

…ompatibility on Windows

fix(process): ensure console window is hidden on Windows during cross-platform spawn

fix(openclaw): enhance process spawning on Windows to hide console window

fix(openclaw): update Windows process spawning to use PowerShell for hidden console window

fix(openclaw): update gateway process spawning to avoid console window on Windows
@kangfenmao kangfenmao force-pushed the DeJeune/fix-checkportopen branch from a895545 to f116d71 Compare March 13, 2026 13:13
Copy link
Copy Markdown
Collaborator

@GeorgeDong32 GeorgeDong32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

This review was translated by Claude.

Code review passed.

Review Conclusion: Approve

Key Fixes:

  1. ✅ Gateway state machine fixes - Now correctly detects external crashes/stops
  2. ✅ Startup error reporting improvements - Captures stderr for detailed error information
  3. ✅ Update detection refinement - Only detects binary channel updates

Code Quality:

  • Added 29 state machine tests with comprehensive coverage
  • Follows project coding standards
  • No security risks

Optional Optimizations (non-blocking):

  • Consider adding stderr size limit (~10KB) to prevent memory accumulation
  • Consider downgrading checkHealth failures to info log level

Review by kimi-k2.5


Original Content

代码审查通过。

审查结论: Approve

关键修复:

  1. ✅ Gateway 状态机修复 - 现在正确检测外部崩溃/停止
  2. ✅ 启动错误报告改进 - 捕获 stderr 显示详细错误信息
  3. ✅ 更新检测精准化 - 只检测 binary 渠道更新

代码质量:

  • 新增 29 个状态机测试,覆盖全面
  • 遵循项目编码规范
  • 无安全风险

可选优化 (非阻塞):

  • 考虑添加 stderr 大小限制 (~10KB) 防止内存累积
  • checkHealth 失败可考虑降为 info 日志级别

Review by kimi-k2.5

DeJeune and others added 4 commits March 14, 2026 00:13
When openclaw respawns itself internally, the original process
reference becomes stale. Now stopGateway falls back to killing
whatever process occupies the gateway port via lsof/netstat.
startGateway also stops stale gateways before starting a new one
instead of reporting port-in-use errors.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: suyao <[email protected]>
checkPortOpen used 'localhost' which may resolve to IPv6 [::1],
causing false positives when the gateway only listens on IPv4.
This made startGateway report "port already in use" even when
the port was actually free. Align with health check by using
127.0.0.1 explicitly. Also add debug logging to port checks.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: suyao <[email protected]>
@kangfenmao kangfenmao merged commit 774cb7a into main Mar 14, 2026
7 checks passed
@kangfenmao kangfenmao deleted the DeJeune/fix-checkportopen branch March 14, 2026 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants