fix(exec): use gbk encoding for Windows cmd.exe wrapper [AI-assisted]#50586
fix(exec): use gbk encoding for Windows cmd.exe wrapper [AI-assisted]#50586kevinten10 wants to merge 3 commits intoopenclaw:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6bfe46ae33
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/process/exec.ts
Outdated
| const stdout = typeof rawStdout === "string" ? rawStdout : (rawStdout?.toString() ?? ""); | ||
| const stderr = typeof rawStderr === "string" ? rawStderr : (rawStderr?.toString() ?? ""); |
There was a problem hiding this comment.
Decode cmd.exe output buffers before returning stdout/stderr
When runExec takes the Windows .cmd/.bat wrapper path and sets encoding: "gbk", Node does not decode that code page for execFile; it returns Buffers instead. The new fallback here then calls rawStdout.toString() / rawStderr.toString() with no encoding, which reinterprets those bytes as UTF-8, so any non-ASCII cmd.exe output (including the Chinese text from #50519) is still mojibake. This needs a real code-page decode step, like the existing Windows buffer-decoding logic elsewhere in the repo, rather than the default Buffer.toString().
Useful? React with 👍 / 👎.
Greptile SummaryThis PR addresses the Windows cmd.exe encoding problem raised in issue #50519. Instead of the previously-reviewed (and broken) approach of hardcoding
Confidence Score: 5/5
Reviews (3): Last reviewed commit: "fix(exec): use gbk encoding for Windows ..." | Re-trigger Greptile |
src/process/exec.ts
Outdated
| const encoding = useCmdWrapper ? ("gbk" as const) : ("utf8" as const); | ||
| const options = | ||
| typeof opts === "number" | ||
| ? { timeout: opts, encoding: "utf8" as const } | ||
| ? { timeout: opts, encoding } | ||
| : { | ||
| timeout: opts.timeoutMs, | ||
| maxBuffer: opts.maxBuffer, | ||
| cwd: opts.cwd, | ||
| encoding: "utf8" as const, | ||
| encoding, | ||
| }; | ||
| try { | ||
| const argv = [command, ...args]; | ||
| let execCommand: string; | ||
| let execArgs: string[]; | ||
| if (process.platform === "win32") { | ||
| const resolved = resolveNpmArgvForWindows(argv); | ||
| if (resolved) { | ||
| execCommand = resolved[0] ?? ""; | ||
| execArgs = resolved.slice(1); | ||
| } else { | ||
| execCommand = resolveCommand(command); | ||
| execArgs = args; | ||
| } | ||
| } else { | ||
| execCommand = resolveCommand(command); | ||
| execArgs = args; | ||
| } | ||
| const useCmdWrapper = isWindowsBatchCommand(execCommand); | ||
| const { stdout, stderr } = useCmdWrapper | ||
| const { stdout: rawStdout, stderr: rawStderr } = useCmdWrapper | ||
| ? await execFileAsync( | ||
| process.env.ComSpec ?? "cmd.exe", | ||
| ["/d", "/s", "/c", buildCmdExeCommandLine(execCommand, execArgs)], | ||
| { ...options, windowsVerbatimArguments: true }, | ||
| ) | ||
| : await execFileAsync(execCommand, execArgs, options); | ||
| // Ensure stdout/stderr are strings (encoding may return Buffer for some encodings like 'gbk') | ||
| const stdout = typeof rawStdout === "string" ? rawStdout : (rawStdout?.toString() ?? ""); | ||
| const stderr = typeof rawStderr === "string" ? rawStderr : (rawStderr?.toString() ?? ""); |
There was a problem hiding this comment.
GBK encoding fix is fundamentally broken
'gbk' is not a valid Node.js Buffer encoding — Buffer.isEncoding('gbk') returns false. The encoding option of execFileAsync (which wraps execFile) is only effective for the Node.js-native encoding list ('utf8', 'latin1', 'ascii', 'utf16le', 'base64', 'hex', etc.). Passing an unrecognized encoding like 'gbk' leads to one of these outcomes depending on the Node.js version:
ERR_UNKNOWN_ENCODINGis thrown — breaking all cmd.exe executions entirely.- The encoding is silently ignored — output is decoded as UTF-8, same as before.
- A raw Buffer is returned — which the code handles via the
.toString()fallback below.
The third case is what the author observed (hence the comment on line 144), but then:
const stdout = typeof rawStdout === "string" ? rawStdout : (rawStdout?.toString() ?? "");Buffer.prototype.toString() with no argument defaults to 'utf8'. This means GBK bytes are still decoded as UTF-8 — the original garbling problem is not fixed.
The correct approach is to request raw buffers explicitly and then decode with an encoding-aware API:
// Use 'buffer' encoding to always receive raw Buffers from execFileAsync
const bufferOptions = typeof opts === "number"
? { timeout: opts, encoding: "buffer" as const }
: { timeout: opts.timeoutMs, maxBuffer: opts.maxBuffer, cwd: opts.cwd, encoding: "buffer" as const };
const { stdout: rawStdout, stderr: rawStderr } = useCmdWrapper
? await execFileAsync(process.env.ComSpec ?? "cmd.exe", [...], { ...bufferOptions, windowsVerbatimArguments: true })
: await execFileAsync(execCommand, execArgs, bufferOptions);
// Decode using the appropriate encoding
// TextDecoder supports 'gbk' when Node.js is built with full ICU data
const decode = (buf: Buffer) =>
useCmdWrapper ? new TextDecoder("gbk").decode(buf) : buf.toString("utf8");
const stdout = decode(rawStdout);
const stderr = decode(rawStderr);Alternatively, use the iconv-lite package (already common in Node.js tooling) for robust cross-platform GBK decoding without ICU dependency.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/process/exec.ts
Line: 126-146
Comment:
**GBK encoding fix is fundamentally broken**
`'gbk'` is not a valid Node.js Buffer encoding — `Buffer.isEncoding('gbk')` returns `false`. The `encoding` option of `execFileAsync` (which wraps `execFile`) is only effective for the Node.js-native encoding list (`'utf8'`, `'latin1'`, `'ascii'`, `'utf16le'`, `'base64'`, `'hex'`, etc.). Passing an unrecognized encoding like `'gbk'` leads to one of these outcomes depending on the Node.js version:
1. **`ERR_UNKNOWN_ENCODING` is thrown** — breaking all cmd.exe executions entirely.
2. **The encoding is silently ignored** — output is decoded as UTF-8, same as before.
3. **A raw Buffer is returned** — which the code handles via the `.toString()` fallback below.
The third case is what the author observed (hence the comment on line 144), but then:
```typescript
const stdout = typeof rawStdout === "string" ? rawStdout : (rawStdout?.toString() ?? "");
```
`Buffer.prototype.toString()` with no argument defaults to `'utf8'`. This means GBK bytes are _still_ decoded as UTF-8 — the original garbling problem is not fixed.
**The correct approach** is to request raw buffers explicitly and then decode with an encoding-aware API:
```typescript
// Use 'buffer' encoding to always receive raw Buffers from execFileAsync
const bufferOptions = typeof opts === "number"
? { timeout: opts, encoding: "buffer" as const }
: { timeout: opts.timeoutMs, maxBuffer: opts.maxBuffer, cwd: opts.cwd, encoding: "buffer" as const };
const { stdout: rawStdout, stderr: rawStderr } = useCmdWrapper
? await execFileAsync(process.env.ComSpec ?? "cmd.exe", [...], { ...bufferOptions, windowsVerbatimArguments: true })
: await execFileAsync(execCommand, execArgs, bufferOptions);
// Decode using the appropriate encoding
// TextDecoder supports 'gbk' when Node.js is built with full ICU data
const decode = (buf: Buffer) =>
useCmdWrapper ? new TextDecoder("gbk").decode(buf) : buf.toString("utf8");
const stdout = decode(rawStdout);
const stderr = decode(rawStderr);
```
Alternatively, use the `iconv-lite` package (already common in Node.js tooling) for robust cross-platform GBK decoding without ICU dependency.
How can I resolve this? If you propose a fix, please make it concise.
src/process/exec.ts
Outdated
| // 'gbk' encoding when wrapping commands via cmd.exe to properly handle | ||
| // non-ASCII characters (e.g., Chinese characters). | ||
| // See: https://github.com/openclaw/openclaw/issues/50519 | ||
| const encoding = useCmdWrapper ? ("gbk" as const) : ("utf8" as const); |
There was a problem hiding this comment.
GBK hardcoded for all Windows — breaks non-Chinese locales
Windows cmd.exe's default code page depends on the system locale, not just whether it's Windows. Hardcoding 'gbk' (CP936) here means:
- Chinese (Simplified) Windows: correct fix
- Japanese Windows (CP932 / Shift-JIS): Japanese non-ASCII output is now garbled with GBK decoding
- Western European Windows (CP1252): accented characters like
é,ü,ñare misread - Eastern European (CP1250), Cyrillic (CP1251), etc.: similarly broken
The code page should be read dynamically. A robust approach is to run chcp 65001 before the command to force UTF-8 in cmd.exe (a common technique used by many tools), or detect the current code page:
// Append 'chcp 65001 > nul & ' to the cmd.exe command line to force UTF-8
const cmdLine = useCmdWrapper
? `chcp 65001 > nul & ${buildCmdExeCommandLine(execCommand, execArgs)}`
: buildCmdExeCommandLine(execCommand, execArgs);With chcp 65001, encoding: 'utf8' (the existing default) would work correctly for all locales, including Chinese.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/process/exec.ts
Line: 126
Comment:
**GBK hardcoded for all Windows — breaks non-Chinese locales**
Windows cmd.exe's default code page depends on the system locale, not just whether it's Windows. Hardcoding `'gbk'` (CP936) here means:
- **Chinese (Simplified) Windows**: correct fix
- **Japanese Windows (CP932 / Shift-JIS)**: Japanese non-ASCII output is now garbled with GBK decoding
- **Western European Windows (CP1252)**: accented characters like `é`, `ü`, `ñ` are misread
- **Eastern European (CP1250)**, **Cyrillic (CP1251)**, etc.: similarly broken
The code page should be read dynamically. A robust approach is to run `chcp 65001` before the command to force UTF-8 in cmd.exe (a common technique used by many tools), or detect the current code page:
```typescript
// Append 'chcp 65001 > nul & ' to the cmd.exe command line to force UTF-8
const cmdLine = useCmdWrapper
? `chcp 65001 > nul & ${buildCmdExeCommandLine(execCommand, execArgs)}`
: buildCmdExeCommandLine(execCommand, execArgs);
```
With `chcp 65001`, `encoding: 'utf8'` (the existing default) would work correctly for all locales, including Chinese.
How can I resolve this? If you propose a fix, please make it concise.
src/process/exec.windows.test.ts
Outdated
| it("uses gbk encoding when wrapping commands via cmd.exe on Windows", async () => { | ||
| const platformSpy = vi.spyOn(process, "platform", "get").mockReturnValue("win32"); | ||
|
|
||
| execFileMock.mockImplementation( | ||
| ( | ||
| _command: string, | ||
| _args: string[], | ||
| options: Record<string, unknown>, | ||
| cb: (err: Error | null, stdout: string, stderr: string) => void, | ||
| ) => { | ||
| // Verify that gbk encoding is used when cmd.exe wrapper is used | ||
| if (_command.toLowerCase().includes("cmd")) { | ||
| expect(options.encoding).toBe("gbk"); | ||
| } | ||
| cb(null, "ok", ""); | ||
| }, | ||
| ); | ||
|
|
||
| try { | ||
| // pnpm.cmd will trigger the cmd.exe wrapper | ||
| await runExec("pnpm", ["--version"], 1000); | ||
| expect(execFileMock).toHaveBeenCalled(); | ||
| } finally { | ||
| platformSpy.mockRestore(); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Test verifies encoding option value but not actual output decoding
This test confirms that options.encoding is set to 'gbk' when the cmd.exe wrapper is used, but it doesn't validate that the output is correctly decoded from GBK bytes into a proper JavaScript string. Since 'gbk' is not a valid Node.js buffer encoding, setting options.encoding = 'gbk' has no meaningful effect — the test passes regardless of whether the GBK decoding actually works.
A meaningful test would:
- Mock
execFileto return a Buffer containing known GBK-encoded bytes (e.g., the GBK encoding of a Chinese string). - Assert that
runExecreturns those characters as the correct Unicode string.
Without such a test, the actual decoding behavior is unverified, and the test gives a false sense of correctness.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/process/exec.windows.test.ts
Line: 122-147
Comment:
**Test verifies encoding option value but not actual output decoding**
This test confirms that `options.encoding` is set to `'gbk'` when the cmd.exe wrapper is used, but it doesn't validate that the output is _correctly decoded_ from GBK bytes into a proper JavaScript string. Since `'gbk'` is not a valid Node.js buffer encoding, setting `options.encoding = 'gbk'` has no meaningful effect — the test passes regardless of whether the GBK decoding actually works.
A meaningful test would:
1. Mock `execFile` to return a Buffer containing known GBK-encoded bytes (e.g., the GBK encoding of a Chinese string).
2. Assert that `runExec` returns those characters as the correct Unicode string.
Without such a test, the actual decoding behavior is unverified, and the test gives a false sense of correctness.
How can I resolve this? If you propose a fix, please make it concise.6bfe46a to
e37dee1
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e37dee152e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/process/exec.ts
Outdated
| const cmdExeCommandLine = useCmdWrapper | ||
| ? `chcp 65001 > nul && ${buildCmdExeCommandLine(execCommand, execArgs)}` | ||
| : buildCmdExeCommandLine(execCommand, execArgs); |
There was a problem hiding this comment.
Avoid cmd.exe escaping when the wrapper is not used
cmdExeCommandLine is computed unconditionally here, so buildCmdExeCommandLine/escapeForCmdExe now runs even when useCmdWrapper is false. That regresses ordinary execFile calls on every platform: src/link-understanding/runner.ts:60-68 and src/media-understanding/runner.entries.ts:634-642 both pass templated URLs/prompts through runExec, and common values like ...?a=1&b=2 or 100% will now throw Unsafe Windows cmd.exe argument detected before the process is launched. The cmd-specific escaping needs to stay inside the .cmd/.bat branch only.
Useful? React with 👍 / 👎.
|
@greptile-apps Thanks for the detailed review! I've updated the PR based on your feedback: Changes made:
The implementation now uses Node.js-native 'utf8' encoding with the chcp command to set the console code page, which is a standard approach used by many tools. Could you please re-review the updated commit `e37dee1`? |
…e character garbling Fixes openclaw#50519 Windows cmd.exe defaults to GBK/CP936 encoding, but the exec tool was hardcoding utf8 encoding. This caused Chinese characters to render as garbled text (mojibake) when executing commands through the cmd.exe wrapper. Changes: - Detect when cmd.exe wrapper is used (for .cmd/.bat files) - Use 'gbk' encoding for cmd.exe wrapper, 'utf8' otherwise - Handle Buffer vs string return types safely Test plan: - [x] Unit tests added for encoding selection - [x] All existing tests pass - [x] Type checking passes - [x] Formatting passes Co-Authored-By: Claude Code <[email protected]>
e37dee1 to
76ed7b7
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 76ed7b7579
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/process/exec.ts
Outdated
| const cmdExeCommandLine = useCmdWrapper | ||
| ? `chcp 65001 > nul && ${buildCmdExeCommandLine(execCommand, execArgs)}` |
There was a problem hiding this comment.
Avoid switching arbitrary batch files to UTF-8 before launch
This helper is not limited to the ASCII pnpm.cmd shim covered by the new tests: exported callers like src/plugin-sdk/run-command.ts:16-30 can pass an actual user/plugin-provided .cmd/.bat path through runExec/runCommandWithTimeout. Prepending chcp 65001 changes the active code page before cmd.exe starts reading that batch file, so legacy wrappers saved in the machine code page (common for localized Windows scripts with non-ASCII literals or paths) are now parsed as UTF-8 and can mis-execute or fail before the real command runs. Decoding captured output after execution avoids mojibake without changing how existing batch files are interpreted.
Useful? React with 👍 / 👎.
|
@greptile-apps I've applied the same fix to \ as well. Both functions now prepend \ when using the cmd.exe wrapper. Please re-review the latest commit (76ed7b7). |
Remove unnecessary cmdExeCommandLine variable that was computed twice - once when useCmdWrapper was true and once when false. Now the command is only built when useCmdWrapper is true. Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
exec uses powershell on Windows by default so this fix likely doesn't work. |
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Summary
Fixes #50519
Windows cmd.exe defaults to a locale-specific code page (e.g., GBK on Chinese Windows, CP932 on Japanese Windows), causing garbled output for non-ASCII characters.
This PR fixes the issue by prepending
chcp 65001 > nul &&to force UTF-8 mode in cmd.exe, which works correctly with Node.js's utf8 encoding.Changes
chcp 65001 > nul &&Why chcp 65001?
AI Disclosure 🤖
Test Plan
pnpm test src/process/exec.windows.test.ts- 5 tests passpnpm check- All checks passpnpm oxfmt- Formatting passesRelated
Review Response
Updated based on reviewer feedback from @greptile-apps:
runExecandrunCommandWithTimeoutfor consistency