fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock by 1998xiangcai-cmd · Pull Request #42027 · openclaw/openclaw

1998xiangcai-cmd · 2026-03-10T09:59:10Z

Summary

Describe the problem and fix in 2�C4 bullets:

Problem: fix(exec): resolve login shell PATH when exec host defaults to "sandbox" without sandbox runtime #41549 left exec in a phantom-sandbox state that skipped login-shell PATH recovery, [UX] Surface layer-specific diagnostics for Control UI auth/origin vs remote CDP reachability in WSL2 + Windows setups #41553 made WSL2 + Windows browser failures hard to triage across Control UI/auth/origin/CDP layers, and [Bug]: `Recurring cron jobs with isolated agentTurn + LLM time out when force-run, while equivalent one-shot at jobs succeed #41558 let detached cron force-runs block on their own cron lane until timeout.
Why it matters: these failures look like broken tooling even when the underlying install, browser profile, or model path is otherwise valid.
What changed: exec now probes login-shell PATH when sandbox host falls back locally; browser status now emits layered control-ui/profile/CDP diagnostics and cleaner HTML upstream errors; cron force-runs now use a detached manual lane so nested cron work can still reuse the cron execution lane.
What did NOT change (scope boundary): this does not add new browser capabilities, does not change sandbox policy beyond PATH recovery for local fallback, and does not redesign cron scheduling semantics.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Exec tool can now find login-shell-managed binaries when the default sandbox host falls back to direct local execution.
openclaw browser status now surfaces layered diagnostics for Control UI auth/origin configuration, browser profile attach mode, and remote/local CDP reachability.
Browser HTTP errors now summarize proxy HTML error pages instead of dumping raw markup.
Detached cron.run --force no longer self-deadlocks when isolated cron work reuses the cron execution lane.

Security Impact (required)

New permissions/capabilities? (Yes/No) No
Secrets/tokens handling changed? (Yes/No) No
New/changed network calls? (Yes/No) No
Command/tool execution surface changed? (Yes/No) Yes
Data access scope changed? (Yes/No) Yes
If any Yes, explain risk + mitigation:
- Exec PATH recovery now also runs for local fallback from the default sandbox host, but only when no sandbox runtime exists and no explicit PATH override was provided.
- Browser CLI status now reads the existing redacted config.get snapshot through the authenticated gateway path to derive operator diagnostics; it does not expose unredacted secrets.

Repro + Verification

Environment

OS: Windows workstation, repo validation run locally
Runtime/container: Node 22 / pnpm
Model/provider: N/A for repro validation
Integration/channel (if any): Browser control + cron
Relevant config (redacted): default exec host, browser remote CDP profile, recurring isolated cron jobs

Steps

Leave tools.exec.host unset, disable sandbox runtime, and run an exec command that depends on login-shell PATH.
Run openclaw browser status against a remote CDP profile in a WSL2 + Windows style setup.
Force-run a recurring isolated cron job whose nested work reuses the cron lane.

Expected

Exec finds login-shell-managed binaries.
Browser status points to the failing layer instead of a generic connection failure.
Force-run completes instead of timing out behind its own lane.

Actual

Covered by the targeted tests below; all now pass.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios: pnpm check; pnpm test -- --run src/agents/bash-tools.exec.path.test.ts src/browser/status-diagnostics.test.ts src/browser/server.status-diagnostics.test.ts src/cli/browser-cli-manage.status-diagnostics.test.ts src/browser/client.test.ts src/cron/service.issue-regressions.test.ts; pnpm build.
Edge cases checked: phantom sandbox PATH fallback, remote/local CDP diagnostics ordering, HTML proxy error summarization, detached cron force-run lane reuse.
What you did not verify: live remote CDP against a real Windows browser host, or a live LLM-backed recurring cron on production credentials.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? (Yes/No) Yes
Config/env changes? (Yes/No) No
Migration needed? (Yes/No) No
If yes, exact upgrade steps:

Failure Recovery (if this breaks)

How to disable/revert this change quickly: revert the three commits in this PR.
Files/config to restore: src/agents/bash-tools.exec.ts, src/browser/*status*, src/cli/browser-cli-*, src/cron/service/ops.ts, src/process/lanes.ts, gateway lane config wiring.
Known bad symptoms reviewers should watch for: missing PATH recovery on macOS launch agents, noisy browser diagnostics ordering, cron force-runs stuck until timeout.

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

Risk: Browser status now includes more diagnostics and merges CLI-derived gateway config checks with server-side CDP checks.
- Mitigation: diagnostics are additive, optional in the payload, redacted, and covered by route + CLI tests.
Risk: The new detached cron manual lane could drift from cron concurrency settings.
- Mitigation: gateway startup and hot-reload now set CommandLane.Cron and CommandLane.CronManual together, with regression coverage for the deadlock case.

greptile-apps · 2026-03-10T10:04:39Z

Greptile Summary

This PR addresses three targeted bug fixes: (1) login-shell PATH recovery now triggers when host === "sandbox" falls back to direct local execution (not just host === "gateway"), (2) openclaw browser status gains layered diagnostics covering Control UI auth/origin config, profile attach-mode, and CDP reachability, with HTML proxy error pages now summarized instead of dumped raw, and (3) cron.run --force is moved to a new CommandLane.CronManual lane so isolated cron jobs can still enqueue on CommandLane.Cron without self-deadlocking.

Key changes:

bash-tools.exec.ts: extends host === "gateway" PATH probe guard to also cover host === "sandbox" when no sandbox runtime exists
status-diagnostics.ts (new): pure-logic module for deriveBrowserStatusDiagnostics, deriveGatewayControlUiDiagnostics, and combineBrowserStatusDiagnostics; no network calls, well-typed, well-tested
client-fetch.ts: summarizeBrowserServiceHttpError correctly identifies and summarizes nginx/proxy HTML error pages; the /<[^>]+>/g tag-stripping regex can be tricked by unencoded > in HTML attribute values and the fallback branch (no <title> or <h1>) has no length cap — minor robustness concerns worth addressing
cron/service/ops.ts + process/lanes.ts + gateway wiring: CronManual lane cleanly separates manual force-runs from scheduled cron execution; both startup and hot-reload set CronManual concurrency in lockstep with Cron, preventing config drift
Test coverage is thorough: unit tests for all three diagnostic paths, a regression test that directly exercises the deadlock scenario, and a server integration test for CDP reachability diagnostics

Confidence Score: 4/5

Safe to merge with one minor robustness improvement suggested for HTML error message handling.
All three fixes are narrowly scoped, backward-compatible, and covered by regression tests. The exec PATH change is guarded by existing conditions. Browser diagnostics are additive and optional. The cron lane split resolves a deadlock without altering scheduling semantics. One style/robustness concern identified in stripHtmlErrorText: the tag-stripping regex can be confused by unencoded > in attribute values, and the fallback path has no length cap, but this is a minor improvement opportunity rather than a blocking issue for merge.
src/browser/client-fetch.ts — consider applying the suggested robustness improvement to the stripHtmlErrorText function (cap output length, improve regex handling of attributes with > characters)

_{Last reviewed commit: 3e8ab2b}

greptile-apps · 2026-03-10T10:04:42Z

+function stripHtmlErrorText(input: string): string {
+  return input
+    .replace(/<script[\s\S]*?<\/script>/gi, " ")
+    .replace(/<style[\s\S]*?<\/style>/gi, " ")
+    .replace(/<[^>]+>/g, " ")
+    .replace(/\s+/g, " ")
+    .trim();


> in attribute values can defeat the tag-stripping regex.

/<[^>]+>/g stops at the first > it encounters, so an attribute value that contains an unencoded > (e.g., aria-label="a > b", which some proxy error pages include) will leave the tail of the attribute and the closing > in the output. This is the fallback path that fires when neither <title> nor <h1> is found, so the resulting "summary" could contain raw attribute fragments and could be unboundedly long if the page body has significant text content.

Consider using a tighter pattern and capping the output length in the fallback:

Suggested change

function stripHtmlErrorText(input: string): string {

return input

.replace(/<script[\s\S]*?<\/script>/gi, " ")

.replace(/<style[\s\S]*?<\/style>/gi, " ")

.replace(/<[^>]+>/g, " ")

.replace(/\s+/g, " ")

.trim();

function stripHtmlErrorText(input: string): string {

return input

.replace(/<script[\s\S]*?<\/script>/gi, " ")

.replace(/<style[\s\S]*?<\/style>/gi, " ")

.replace(/<[^>]*>/g, " ")

.replace(/&[a-z]+;/gi, " ")

.replace(/\s+/g, " ")

.trim()

.slice(0, 200);

}

Note: [^>]* (zero-or-more instead of one-or-more) is a minor alignment fix, but the more actionable change is adding .slice(0, 200) so that an HTML page without a title or heading can't push a multi-kilobyte text dump into the thrown error message.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/browser/client-fetch.ts Line: 157-163 Comment: `>` in attribute values can defeat the tag-stripping regex. `/<[^>]+>/g` stops at the first `>` it encounters, so an attribute value that contains an unencoded `>` (e.g., `aria-label="a > b"`, which some proxy error pages include) will leave the tail of the attribute and the closing `>` in the output. This is the fallback path that fires when neither `<title>` nor `<h1>` is found, so the resulting "summary" could contain raw attribute fragments and could be unboundedly long if the page body has significant text content. Consider using a tighter pattern and capping the output length in the fallback: ```suggestion function stripHtmlErrorText(input: string): string { return input .replace(/<script[\s\S]*?<\/script>/gi, " ") .replace(/<style[\s\S]*?<\/style>/gi, " ") .replace(/<[^>]*>/g, " ") .replace(/&[a-z]+;/gi, " ") .replace(/\s+/g, " ") .trim() .slice(0, 200); } ``` Note: `[^>]*` (zero-or-more instead of one-or-more) is a minor alignment fix, but the more actionable change is adding `.slice(0, 200)` so that an HTML page without a title or heading can't push a multi-kilobyte text dump into the thrown error message. How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3e8ab2b689

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-10T10:06:55Z

+    };
+  };
+}> {
+  const payload = await callGatewayFromCli("config.get", opts, {});


Reuse short status timeout for config snapshot call

The new browser status path now always does config.get via callGatewayConfigSnapshot, but this helper does not pass a bounded timeout and therefore inherits the CLI default timeout from callGatewayFromCli (30s), unlike fetchBrowserStatus which explicitly uses 1500ms; when config.get is slow or partially unavailable, openclaw browser status can hang for tens of seconds before falling back to existing diagnostics, which is a responsiveness regression introduced by this change.

Useful? React with 👍 / 👎.

Owlock · 2026-03-24T20:36:29Z

Additional real-world validation from today (Mar 24): this PR’s layered diagnostics would have helped a lot in a WSL2 + Windows CDP outage triggered by virtualization mode toggling.

Repro chain:

bcdedit /set hypervisorlaunchtype off (game mode) -> reboot
bcdedit /set hypervisorlaunchtype auto -> reboot
Chrome CDP still healthy on Windows localhost (127.0.0.1:9222), but unreachable from WSL host endpoint (172.30.144.1:9222) until bridge recovery

Recovery that restored remote tabs:

ensure iphlpsvc running
recreate portproxy (0.0.0.0:9222 -> 127.0.0.1:9222)
allow inbound TCP 9222 in firewall
verify both /json/version checks (localhost and WSL-reachable host address)

I posted full field notes in #41553 as follow-up evidence. +1 on this direction.

1998xiangcai-cmd added 3 commits March 10, 2026 17:45

fix(exec): resolve login-shell PATH for phantom sandbox fallback

7ea4d85

ux(browser): surface layered control-ui and CDP diagnostics

25deb76

fix(cron): avoid force-run cron lane self-deadlock

3e8ab2b

openclaw-barnacle bot added gateway Gateway runtime cli CLI command changes agents Agent runtime and tooling size: L labels Mar 10, 2026

greptile-apps bot reviewed Mar 10, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock#42027

fix: resolve exec PATH fallback, layered browser diagnostics, and cron force-run deadlock#42027
1998xiangcai-cmd wants to merge 3 commits intoopenclaw:mainfrom
1998xiangcai-cmd:xiangcai/exec-browser-cron-fixes-41549-41553-41558

1998xiangcai-cmd commented Mar 10, 2026

Uh oh!

greptile-apps bot commented Mar 10, 2026

Uh oh!

greptile-apps bot Mar 10, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 10, 2026

Uh oh!

Owlock commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

1998xiangcai-cmd commented Mar 10, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

Uh oh!

greptile-apps bot commented Mar 10, 2026

Greptile Summary

Confidence Score: 4/5

Uh oh!

greptile-apps bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Owlock commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants