Skip to content

process: fix Windows runExec garbled CJK output [AI-assisted]#50885

Open
zhangyongjie1997 wants to merge 3 commits intoopenclaw:mainfrom
zhangyongjie1997:fix/50519-windows-exec-utf8
Open

process: fix Windows runExec garbled CJK output [AI-assisted]#50885
zhangyongjie1997 wants to merge 3 commits intoopenclaw:mainfrom
zhangyongjie1997:fix/50519-windows-exec-utf8

Conversation

@zhangyongjie1997
Copy link
Copy Markdown

@zhangyongjie1997 zhangyongjie1997 commented Mar 20, 2026

Summary

  • Problem: Windows .cmd/.bat execution through cmd.exe wrapper could produce garbled CJK output because the active code page is often not UTF-8.
  • Why it matters: Windows users running commands with Chinese/Japanese/Korean output (or non-ASCII paths/args) get unreadable results.
  • What changed:
    • runExec now prepends chcp 65001>nul && on Windows cmd-wrapper paths when output decoding is UTF-8.
    • runCommandWithTimeout gets the same UTF-8 code-page prefix for cmd-wrapper consistency.
    • Fixed a regression by moving cmd command-line building back behind the useCmdWrapper gate (avoids spurious throws on non-wrapper paths).
    • Added UTF-8 alias handling so both utf8 and utf-8 trigger the UTF-8 code-page path.
    • Added/updated regression tests in src/process/exec.windows.test.ts.
  • Scope boundary: this PR only updates Windows cmd-wrapper encoding behavior and related guardrails; it does not change non-wrapper execution semantics.

Change Type

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Linked Issue/PR

User-visible / Behavior Changes

  • CJK output from Windows .cmd/.bat wrapper execution is no longer mojibake in UTF-8 decode scenarios.
  • runExec(..., { encoding: "utf8" }) and runExec(..., { encoding: "utf-8" }) now behave consistently for wrapper code-page setup.
  • Non-wrapper execution no longer accidentally trips cmd.exe argument safety checks.

Security Impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: Linux CI + Windows lanes in repo CI
  • Runtime: Node 22 / pnpm workspace
  • Area: process execution (runExec, runCommandWithTimeout)

Steps

  1. Execute a .cmd command through Windows cmd-wrapper path with CJK output.
  2. Verify output decoding in UTF-8 mode.
  3. Verify non-wrapper commands are not validated as cmd command-lines.
  4. Verify UTF-8 alias handling (utf8 and utf-8).

Result

  • Regression tests pass: pnpm test -- src/process/exec.windows.test.ts
  • PR CI checks are green.

Risks and Mitigations

  • Risk: behavior change is specific to Windows cmd-wrapper path and could alter edge-case shell output formatting.
    • Mitigation: changes are tightly scoped and covered by focused regression tests for wrapper/non-wrapper and encoding alias cases.

Closes #50519

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 20, 2026

Greptile Summary

This PR fixes Windows CJK (mojibake) output when running .cmd/.bat commands via runExec by prepending chcp 65001>nul && to the cmd wrapper command line when the encoding is UTF-8, and adds a regression test for this behaviour.

Key findings:

  • Regression — unconditional buildCmdExeCommandLine call (P1): buildCmdExeCommandLine (which calls the argument-safety validator escapeForCmdExe) is now evaluated eagerly for every runExec call, not just those that go through the cmd wrapper. Arguments containing &, |, <, >, ^, or % will now throw "Unsafe Windows cmd.exe argument detected" on non-Windows paths and on Windows non-.cmd invocations, which is a behaviour regression from the original code.

  • runCommandWithTimeout not updated (P2): The spawn-based runCommandWithTimeout function has an identical cmd wrapper path that does not receive the chcp 65001>nul && fix, leaving CJK output garbled in that code path.

  • The encoding option addition and the chcp injection logic itself are correct; the UTF-8 code page change is guarded appropriately by process.platform === "win32", useCmdWrapper, and the encoding value.

Confidence Score: 2/5

  • Not safe to merge as-is: the eager buildCmdExeCommandLine call introduces a regression that breaks non-cmd command invocations with shell-metacharacter arguments.
  • The chcp injection concept is sound, but moving buildCmdExeCommandLine outside the useCmdWrapper guard is a correctness bug that causes spurious throws for previously-working call sites. This must be fixed before merging.
  • src/process/exec.ts — specifically lines 135–139 where buildCmdExeCommandLine is called unconditionally.

Comments Outside Diff (1)

  1. src/process/exec.ts, line 235-239 (link)

    P2 runCommandWithTimeout not updated with the same fix

    runCommandWithTimeout uses spawn with a cmd wrapper path (lines 235–239) that passes its command line to cmd.exe /c without setting the UTF-8 code page. If a caller invokes a .cmd/.bat command that produces CJK output through this path the same mojibake issue will appear.

    Consider applying the same chcp 65001>nul && prefix here, or at minimum leaving a // TODO comment so the inconsistency is intentional and visible.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/process/exec.ts
    Line: 235-239
    
    Comment:
    **`runCommandWithTimeout` not updated with the same fix**
    
    `runCommandWithTimeout` uses `spawn` with a cmd wrapper path (lines 235–239) that passes its command line to `cmd.exe /c` without setting the UTF-8 code page. If a caller invokes a `.cmd`/`.bat` command that produces CJK output through this path the same mojibake issue will appear.
    
    Consider applying the same `chcp 65001>nul &&` prefix here, or at minimum leaving a `// TODO` comment so the inconsistency is intentional and visible.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/process/exec.ts
Line: 135-139

Comment:
**`buildCmdExeCommandLine` now called unconditionally, causing spurious throws**

`buildCmdExeCommandLine` calls `escapeForCmdExe`, which throws for any argument containing `&|<>^%\r\n`. In the original code this function was only invoked inside the `useCmdWrapper` ternary branch — JavaScript's lazy evaluation meant it was a no-op for non-`.cmd` commands. Now it is eagerly evaluated for *every* `runExec` call, including non-Windows paths and Windows calls to plain `.exe` commands.

For example, running a command like `runExec("grep", ["-P", "a|b"], ...)` on Linux, or `runExec("node.exe", ["--eval", "1+1"], ...)` on Windows, would now throw with `"Unsafe Windows cmd.exe argument detected"` even though neither path uses the cmd wrapper at all.

Move the call inside the branch where it is actually needed:

```suggestion
    const useCmdWrapper = isWindowsBatchCommand(execCommand);
    const wrappedCommandLine = (() => {
      if (!useCmdWrapper) return null;
      const cmdCommandLine = buildCmdExeCommandLine(execCommand, execArgs);
      return process.platform === "win32" && encoding.toLowerCase() === "utf8"
        ? `chcp 65001>nul && ${cmdCommandLine}`
        : cmdCommandLine;
    })();
```

Then reference `wrappedCommandLine!` in the `useCmdWrapper` branch and keep the `else` path unchanged.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/process/exec.ts
Line: 235-239

Comment:
**`runCommandWithTimeout` not updated with the same fix**

`runCommandWithTimeout` uses `spawn` with a cmd wrapper path (lines 235–239) that passes its command line to `cmd.exe /c` without setting the UTF-8 code page. If a caller invokes a `.cmd`/`.bat` command that produces CJK output through this path the same mojibake issue will appear.

Consider applying the same `chcp 65001>nul &&` prefix here, or at minimum leaving a `// TODO` comment so the inconsistency is intentional and visible.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "process: fix Windows..."

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a66fca1b37

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 79c1d8d9f2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@zhangyongjie1997 zhangyongjie1997 changed the title process: fix Windows runExec garbled CJK output process: fix Windows runExec garbled CJK output [AI-assisted] Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Windows exec tool produces garbled Chinese characters due to hardcoded UTF-8 encoding

1 participant