fix(openclaw): fix gateway status detection and improve error reporting#13433
Merged
kangfenmao merged 15 commits intomainfrom Mar 14, 2026
Merged
fix(openclaw): fix gateway status detection and improve error reporting#13433kangfenmao merged 15 commits intomainfrom
kangfenmao merged 15 commits intomainfrom
Conversation
- getStatus now probes health in all states (not just stopped/error), detecting crashed gateways that still show as 'running' - checkHealth syncs gatewayStatus to 'stopped' when probe returns unhealthy - startAndWaitForGateway captures stderr (via pipe) to surface meaningful error messages instead of generic "gateway exited with code 1" - parseUpdateStatus only matches binary-channel updates, ignoring npm/pkg channels that require a different upgrade path - Add 29 state-machine tests covering getStatus, checkHealth, startGateway, stopGateway, restartGateway, and full lifecycle transitions Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: suyao <[email protected]>
a12224e to
af22702
Compare
Collaborator
…erflow Add max-h-25, overflow-y-auto, and break-all to the error message span so long error texts (stack traces, config errors) don't blow out the layout vertically or horizontally. Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: suyao <[email protected]>
Collaborator
Author
Collaborator
|
Note This issue/comment/review was translated by Claude.
Original Content
|
The gateway is not registered as a system service, so restart is not applicable. Remove the restart method, IPC channel, UI button, and related tests. Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: suyao <[email protected]>
…herry config Migrate from openclaw.cherry.json back to openclaw.json so that the CLI can also read the config without needing the OPENCLAW_CONFIG_PATH env var override. Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: suyao <[email protected]>
Add migration logic to handle existing users who have the old openclaw.cherry.json config file. Renames it to openclaw.json if no default config exists, otherwise removes the legacy file. Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: suyao <[email protected]>
…nstall On macOS/Linux, symlink the openclaw binary to /usr/local/bin so it is accessible from the command line. On Windows, add the bin directory to the user PATH via the registry. Both are cleaned up on uninstall. Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: suyao <[email protected]>
…SS class order When both openclaw.json and openclaw.cherry.json exist, back up the default config and use the cherry config since it contains the user's Cherry Studio settings. Also fix Biome CSS class sort order. Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: suyao <[email protected]>
…requests for improved performance
kangfenmao
approved these changes
Mar 13, 2026
Collaborator
There was a problem hiding this comment.
Note
This review was translated by Claude.
Review Summary
✅ Approved
Core Improvements:
- Gateway Status Detection - Changed from Socket detection to HTTP request, more reliable
- Health Check Sync - Correctly updates gatewayStatus on failure
- Error Messages - Captures stderr to display detailed errors, no longer just code 1
- Update Detection - Only matches binary channels to avoid false positives
- Configuration Migration - Automatically handles openclaw.cherry.json → openclaw.json
- CLI Installation - Supports macOS/Linux symlinks and Windows PATH
Test Coverage:
- ✅ Added 29 state machine tests
- ✅ Covers complete lifecycle
Code Quality:
- Clear commit messages
- Comprehensive backward compatibility handling
- Biome CSS class sorting fixes
**LGTM! 👍
Original Content
审查总结
✅ 批准合并
核心改进:
- 网关状态检测 - 从 Socket 探测改为 HTTP 请求,更可靠
- 健康检查同步 - 失败时正确更新 gatewayStatus
- 错误信息 - 捕获 stderr 显示详细错误,不再是 code 1
- 更新检测 - 只匹配 binary 通道,避免误报
- 配置迁移 - 自动处理 openclaw.cherry.json → openclaw.json
- CLI 安装 - 支持 macOS/Linux 软链和 Windows PATH
测试覆盖:
- ✅ 新增 29 个状态机测试
- ✅ 覆盖完整生命周期
代码质量:
- 清晰的提交信息
- 完善的向后兼容处理
- Biome CSS 类排序修复
**LGTM! 👍
…ompatibility on Windows fix(process): ensure console window is hidden on Windows during cross-platform spawn fix(openclaw): enhance process spawning on Windows to hide console window fix(openclaw): update Windows process spawning to use PowerShell for hidden console window fix(openclaw): update gateway process spawning to avoid console window on Windows
a895545 to
f116d71
Compare
…ference and updating stop logic
GeorgeDong32
approved these changes
Mar 13, 2026
Collaborator
There was a problem hiding this comment.
Note
This review was translated by Claude.
Code review passed.
Review Conclusion: Approve
Key Fixes:
- ✅ Gateway state machine fixes - Now correctly detects external crashes/stops
- ✅ Startup error reporting improvements - Captures stderr for detailed error information
- ✅ Update detection refinement - Only detects binary channel updates
Code Quality:
- Added 29 state machine tests with comprehensive coverage
- Follows project coding standards
- No security risks
Optional Optimizations (non-blocking):
- Consider adding stderr size limit (~10KB) to prevent memory accumulation
- Consider downgrading
checkHealthfailures toinfolog level
Review by kimi-k2.5
Original Content
代码审查通过。
审查结论: Approve
关键修复:
- ✅ Gateway 状态机修复 - 现在正确检测外部崩溃/停止
- ✅ 启动错误报告改进 - 捕获 stderr 显示详细错误信息
- ✅ 更新检测精准化 - 只检测 binary 渠道更新
代码质量:
- 新增 29 个状态机测试,覆盖全面
- 遵循项目编码规范
- 无安全风险
可选优化 (非阻塞):
- 考虑添加 stderr 大小限制 (~10KB) 防止内存累积
checkHealth失败可考虑降为info日志级别
Review by kimi-k2.5
When openclaw respawns itself internally, the original process reference becomes stale. Now stopGateway falls back to killing whatever process occupies the gateway port via lsof/netstat. startGateway also stops stale gateways before starting a new one instead of reporting port-in-use errors. Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: suyao <[email protected]>
checkPortOpen used 'localhost' which may resolve to IPv6 [::1], causing false positives when the gateway only listens on IPv4. This made startGateway report "port already in use" even when the port was actually free. Align with health check by using 127.0.0.1 explicitly. Also add debug logging to port checks. Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: suyao <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


What this PR does
Before this PR:
getStatusonly probed health when status wasstoppedorerror. If the gateway crashed while status wasrunning, the UI would never detect the crash and keep showing "运行中" (running).checkHealthprobed the gateway but never updatedgatewayStatus, so a failed health check had no lasting effect.startAndWaitForGatewayspawned the process withstdio: 'ignore', discarding stderr. On startup failure, the user only saw "gateway exited with code 1" with no actionable detail.parseUpdateStatusmatched any update channel (npm, pkg, binary), causing false positives for binary-installed users when only an npm/pkg update was available.After this PR:
getStatusprobes health in all non-startingstates, transitioningrunning → stoppedwhen the gateway is unreachable.checkHealthsyncsgatewayStatustostoppedwhen the probe returns unhealthy.startAndWaitForGatewaypipes stderr and extracts the first 3 lines of error output for meaningful error messages.parseUpdateStatusonly matches binary-channel updates, ignoring npm/pkg channels.Fixes #
Why we need it and why it was done in this way
The gateway status was a one-way latch: once set to
running, it could only be changed by an explicitstopGatewaycall. External crashes (process killed, config errors, port conflicts) left the UI in an incorrect state.The following tradeoffs were made:
getStatusnow always callsprobeGatewayHealth(except duringstarting). This adds a small overhead per status poll, but is necessary for correctness since the gateway process lifecycle is independent (detached).['ignore', 'ignore', 'pipe']instead of full'pipe'to minimize resource usage — we only need stderr for error diagnostics.The following alternatives were considered:
openclaw update statustable structure — rejected in favor of simple regex matching on thebinarychannel keyword.Breaking changes
None.
Special notes for your reviewer
parseUpdateStatuschange means users who installed OpenClaw via npm will no longer see update notifications in Cherry Studio. This is intentional — Cherry Studio only manages binary installations.OpenClawServiceclass has state-machine test coverage.Checklist
/gh-pr-review,gh pr diff, or GitHub UI) before requesting review from othersRelease note