Skip to content

fix(exec): use gbk encoding for Windows cmd.exe wrapper [AI-assisted]#50586

Open
kevinten10 wants to merge 3 commits intoopenclaw:mainfrom
kevinten10:fix/windows-exec-encoding-50519
Open

fix(exec): use gbk encoding for Windows cmd.exe wrapper [AI-assisted]#50586
kevinten10 wants to merge 3 commits intoopenclaw:mainfrom
kevinten10:fix/windows-exec-encoding-50519

Conversation

@kevinten10
Copy link
Copy Markdown
Contributor

@kevinten10 kevinten10 commented Mar 19, 2026

Summary

Fixes #50519

Windows cmd.exe defaults to a locale-specific code page (e.g., GBK on Chinese Windows, CP932 on Japanese Windows), causing garbled output for non-ASCII characters.

This PR fixes the issue by prepending chcp 65001 > nul && to force UTF-8 mode in cmd.exe, which works correctly with Node.js's utf8 encoding.

Changes

  • runExec: Detect when cmd.exe wrapper is used (for .cmd/.bat files) and prepend chcp 65001 > nul &&
  • runCommandWithTimeout: Apply the same chcp fix for consistency
  • Keep utf8 encoding (Node.js native support)
  • Added unit tests to verify chcp command is prepended in both functions

Why chcp 65001?

  • Works for all locales (Chinese, Japanese, European, etc.)
  • No external dependencies needed
  • Standard Windows approach used by many tools

AI Disclosure 🤖

  • AI-assisted (Claude Code)
  • I understand what the code does
  • Fully tested locally
  • All CI checks pass

Test Plan

  • pnpm test src/process/exec.windows.test.ts - 5 tests pass
  • pnpm check - All checks pass
  • pnpm oxfmt - Formatting passes

Related

Review Response

Updated based on reviewer feedback from @greptile-apps:

  • Changed from hardcoded 'gbk' encoding (not supported by Node.js) to 'chcp 65001' approach
  • Applied the same fix to both runExec and runCommandWithTimeout for consistency
  • This solution works for all Windows locales, not just Chinese

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6bfe46ae33

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +145 to +146
const stdout = typeof rawStdout === "string" ? rawStdout : (rawStdout?.toString() ?? "");
const stderr = typeof rawStderr === "string" ? rawStderr : (rawStderr?.toString() ?? "");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Decode cmd.exe output buffers before returning stdout/stderr

When runExec takes the Windows .cmd/.bat wrapper path and sets encoding: "gbk", Node does not decode that code page for execFile; it returns Buffers instead. The new fallback here then calls rawStdout.toString() / rawStderr.toString() with no encoding, which reinterprets those bytes as UTF-8, so any non-ASCII cmd.exe output (including the Chinese text from #50519) is still mojibake. This needs a real code-page decode step, like the existing Windows buffer-decoding logic elsewhere in the repo, rather than the default Buffer.toString().

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 19, 2026

Greptile Summary

This PR addresses the Windows cmd.exe encoding problem raised in issue #50519. Instead of the previously-reviewed (and broken) approach of hardcoding 'gbk' as a Node.js encoding, the author now prepends chcp 65001 > nul && to all cmd.exe-wrapped command lines, forcing UTF-8 mode before the actual command runs. The fix is applied consistently to both runExec and runCommandWithTimeout, and three new tests verify the behaviour.

  • chcp 65001 approach is correct and universal — it works regardless of the system locale (Chinese, Japanese, European, etc.), unlike the previous GBK-only fix
  • The > and & characters in the hardcoded chcp 65001 > nul && prefix are intentionally outside the escapeForCmdExe sanitisation path (they come before buildCmdExeCommandLine), which is the right design
  • The utf8 encoding option on execFile / execFileAsync is kept unchanged — correct, since after chcp 65001 the process output is genuine UTF-8
  • The minor refactor of moving execCommand, execArgs, and useCmdWrapper before the try block is safe: buildCmdExeCommandLine (which can throw on unsafe chars) is still inside the try
  • Tests confirm the chcp 65001 prefix appears in the command string for both runExec and runCommandWithTimeout

Confidence Score: 5/5

  • This PR is safe to merge — the previous critical encoding issues are fully resolved with a correct, locale-agnostic approach.
  • All concerns raised in the previous review round have been addressed: the broken 'gbk' Node.js encoding is gone, the hardcoded-locale problem is gone, and both affected functions now use the well-established chcp 65001 technique. The implementation is straightforward, well-commented, and covered by new tests. No remaining issues would affect production correctness.
  • No files require special attention.

Reviews (3): Last reviewed commit: "fix(exec): use gbk encoding for Windows ..." | Re-trigger Greptile

Comment on lines +126 to +146
const encoding = useCmdWrapper ? ("gbk" as const) : ("utf8" as const);
const options =
typeof opts === "number"
? { timeout: opts, encoding: "utf8" as const }
? { timeout: opts, encoding }
: {
timeout: opts.timeoutMs,
maxBuffer: opts.maxBuffer,
cwd: opts.cwd,
encoding: "utf8" as const,
encoding,
};
try {
const argv = [command, ...args];
let execCommand: string;
let execArgs: string[];
if (process.platform === "win32") {
const resolved = resolveNpmArgvForWindows(argv);
if (resolved) {
execCommand = resolved[0] ?? "";
execArgs = resolved.slice(1);
} else {
execCommand = resolveCommand(command);
execArgs = args;
}
} else {
execCommand = resolveCommand(command);
execArgs = args;
}
const useCmdWrapper = isWindowsBatchCommand(execCommand);
const { stdout, stderr } = useCmdWrapper
const { stdout: rawStdout, stderr: rawStderr } = useCmdWrapper
? await execFileAsync(
process.env.ComSpec ?? "cmd.exe",
["/d", "/s", "/c", buildCmdExeCommandLine(execCommand, execArgs)],
{ ...options, windowsVerbatimArguments: true },
)
: await execFileAsync(execCommand, execArgs, options);
// Ensure stdout/stderr are strings (encoding may return Buffer for some encodings like 'gbk')
const stdout = typeof rawStdout === "string" ? rawStdout : (rawStdout?.toString() ?? "");
const stderr = typeof rawStderr === "string" ? rawStderr : (rawStderr?.toString() ?? "");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 GBK encoding fix is fundamentally broken

'gbk' is not a valid Node.js Buffer encoding — Buffer.isEncoding('gbk') returns false. The encoding option of execFileAsync (which wraps execFile) is only effective for the Node.js-native encoding list ('utf8', 'latin1', 'ascii', 'utf16le', 'base64', 'hex', etc.). Passing an unrecognized encoding like 'gbk' leads to one of these outcomes depending on the Node.js version:

  1. ERR_UNKNOWN_ENCODING is thrown — breaking all cmd.exe executions entirely.
  2. The encoding is silently ignored — output is decoded as UTF-8, same as before.
  3. A raw Buffer is returned — which the code handles via the .toString() fallback below.

The third case is what the author observed (hence the comment on line 144), but then:

const stdout = typeof rawStdout === "string" ? rawStdout : (rawStdout?.toString() ?? "");

Buffer.prototype.toString() with no argument defaults to 'utf8'. This means GBK bytes are still decoded as UTF-8 — the original garbling problem is not fixed.

The correct approach is to request raw buffers explicitly and then decode with an encoding-aware API:

// Use 'buffer' encoding to always receive raw Buffers from execFileAsync
const bufferOptions = typeof opts === "number"
  ? { timeout: opts, encoding: "buffer" as const }
  : { timeout: opts.timeoutMs, maxBuffer: opts.maxBuffer, cwd: opts.cwd, encoding: "buffer" as const };

const { stdout: rawStdout, stderr: rawStderr } = useCmdWrapper
  ? await execFileAsync(process.env.ComSpec ?? "cmd.exe", [...], { ...bufferOptions, windowsVerbatimArguments: true })
  : await execFileAsync(execCommand, execArgs, bufferOptions);

// Decode using the appropriate encoding
// TextDecoder supports 'gbk' when Node.js is built with full ICU data
const decode = (buf: Buffer) =>
  useCmdWrapper ? new TextDecoder("gbk").decode(buf) : buf.toString("utf8");

const stdout = decode(rawStdout);
const stderr = decode(rawStderr);

Alternatively, use the iconv-lite package (already common in Node.js tooling) for robust cross-platform GBK decoding without ICU dependency.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/process/exec.ts
Line: 126-146

Comment:
**GBK encoding fix is fundamentally broken**

`'gbk'` is not a valid Node.js Buffer encoding — `Buffer.isEncoding('gbk')` returns `false`. The `encoding` option of `execFileAsync` (which wraps `execFile`) is only effective for the Node.js-native encoding list (`'utf8'`, `'latin1'`, `'ascii'`, `'utf16le'`, `'base64'`, `'hex'`, etc.). Passing an unrecognized encoding like `'gbk'` leads to one of these outcomes depending on the Node.js version:

1. **`ERR_UNKNOWN_ENCODING` is thrown** — breaking all cmd.exe executions entirely.
2. **The encoding is silently ignored** — output is decoded as UTF-8, same as before.
3. **A raw Buffer is returned** — which the code handles via the `.toString()` fallback below.

The third case is what the author observed (hence the comment on line 144), but then:

```typescript
const stdout = typeof rawStdout === "string" ? rawStdout : (rawStdout?.toString() ?? "");
```

`Buffer.prototype.toString()` with no argument defaults to `'utf8'`. This means GBK bytes are _still_ decoded as UTF-8 — the original garbling problem is not fixed.

**The correct approach** is to request raw buffers explicitly and then decode with an encoding-aware API:

```typescript
// Use 'buffer' encoding to always receive raw Buffers from execFileAsync
const bufferOptions = typeof opts === "number"
  ? { timeout: opts, encoding: "buffer" as const }
  : { timeout: opts.timeoutMs, maxBuffer: opts.maxBuffer, cwd: opts.cwd, encoding: "buffer" as const };

const { stdout: rawStdout, stderr: rawStderr } = useCmdWrapper
  ? await execFileAsync(process.env.ComSpec ?? "cmd.exe", [...], { ...bufferOptions, windowsVerbatimArguments: true })
  : await execFileAsync(execCommand, execArgs, bufferOptions);

// Decode using the appropriate encoding
// TextDecoder supports 'gbk' when Node.js is built with full ICU data
const decode = (buf: Buffer) =>
  useCmdWrapper ? new TextDecoder("gbk").decode(buf) : buf.toString("utf8");

const stdout = decode(rawStdout);
const stderr = decode(rawStderr);
```

Alternatively, use the `iconv-lite` package (already common in Node.js tooling) for robust cross-platform GBK decoding without ICU dependency.

How can I resolve this? If you propose a fix, please make it concise.

// 'gbk' encoding when wrapping commands via cmd.exe to properly handle
// non-ASCII characters (e.g., Chinese characters).
// See: https://github.com/openclaw/openclaw/issues/50519
const encoding = useCmdWrapper ? ("gbk" as const) : ("utf8" as const);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 GBK hardcoded for all Windows — breaks non-Chinese locales

Windows cmd.exe's default code page depends on the system locale, not just whether it's Windows. Hardcoding 'gbk' (CP936) here means:

  • Chinese (Simplified) Windows: correct fix
  • Japanese Windows (CP932 / Shift-JIS): Japanese non-ASCII output is now garbled with GBK decoding
  • Western European Windows (CP1252): accented characters like é, ü, ñ are misread
  • Eastern European (CP1250), Cyrillic (CP1251), etc.: similarly broken

The code page should be read dynamically. A robust approach is to run chcp 65001 before the command to force UTF-8 in cmd.exe (a common technique used by many tools), or detect the current code page:

// Append 'chcp 65001 > nul & ' to the cmd.exe command line to force UTF-8
const cmdLine = useCmdWrapper
  ? `chcp 65001 > nul & ${buildCmdExeCommandLine(execCommand, execArgs)}`
  : buildCmdExeCommandLine(execCommand, execArgs);

With chcp 65001, encoding: 'utf8' (the existing default) would work correctly for all locales, including Chinese.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/process/exec.ts
Line: 126

Comment:
**GBK hardcoded for all Windows — breaks non-Chinese locales**

Windows cmd.exe's default code page depends on the system locale, not just whether it's Windows. Hardcoding `'gbk'` (CP936) here means:

- **Chinese (Simplified) Windows**: correct fix
- **Japanese Windows (CP932 / Shift-JIS)**: Japanese non-ASCII output is now garbled with GBK decoding
- **Western European Windows (CP1252)**: accented characters like `é`, `ü`, `ñ` are misread
- **Eastern European (CP1250)**, **Cyrillic (CP1251)**, etc.: similarly broken

The code page should be read dynamically. A robust approach is to run `chcp 65001` before the command to force UTF-8 in cmd.exe (a common technique used by many tools), or detect the current code page:

```typescript
// Append 'chcp 65001 > nul & ' to the cmd.exe command line to force UTF-8
const cmdLine = useCmdWrapper
  ? `chcp 65001 > nul & ${buildCmdExeCommandLine(execCommand, execArgs)}`
  : buildCmdExeCommandLine(execCommand, execArgs);
```

With `chcp 65001`, `encoding: 'utf8'` (the existing default) would work correctly for all locales, including Chinese.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +122 to +147
it("uses gbk encoding when wrapping commands via cmd.exe on Windows", async () => {
const platformSpy = vi.spyOn(process, "platform", "get").mockReturnValue("win32");

execFileMock.mockImplementation(
(
_command: string,
_args: string[],
options: Record<string, unknown>,
cb: (err: Error | null, stdout: string, stderr: string) => void,
) => {
// Verify that gbk encoding is used when cmd.exe wrapper is used
if (_command.toLowerCase().includes("cmd")) {
expect(options.encoding).toBe("gbk");
}
cb(null, "ok", "");
},
);

try {
// pnpm.cmd will trigger the cmd.exe wrapper
await runExec("pnpm", ["--version"], 1000);
expect(execFileMock).toHaveBeenCalled();
} finally {
platformSpy.mockRestore();
}
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Test verifies encoding option value but not actual output decoding

This test confirms that options.encoding is set to 'gbk' when the cmd.exe wrapper is used, but it doesn't validate that the output is correctly decoded from GBK bytes into a proper JavaScript string. Since 'gbk' is not a valid Node.js buffer encoding, setting options.encoding = 'gbk' has no meaningful effect — the test passes regardless of whether the GBK decoding actually works.

A meaningful test would:

  1. Mock execFile to return a Buffer containing known GBK-encoded bytes (e.g., the GBK encoding of a Chinese string).
  2. Assert that runExec returns those characters as the correct Unicode string.

Without such a test, the actual decoding behavior is unverified, and the test gives a false sense of correctness.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/process/exec.windows.test.ts
Line: 122-147

Comment:
**Test verifies encoding option value but not actual output decoding**

This test confirms that `options.encoding` is set to `'gbk'` when the cmd.exe wrapper is used, but it doesn't validate that the output is _correctly decoded_ from GBK bytes into a proper JavaScript string. Since `'gbk'` is not a valid Node.js buffer encoding, setting `options.encoding = 'gbk'` has no meaningful effect — the test passes regardless of whether the GBK decoding actually works.

A meaningful test would:
1. Mock `execFile` to return a Buffer containing known GBK-encoded bytes (e.g., the GBK encoding of a Chinese string).
2. Assert that `runExec` returns those characters as the correct Unicode string.

Without such a test, the actual decoding behavior is unverified, and the test gives a false sense of correctness.

How can I resolve this? If you propose a fix, please make it concise.

@kevinten10 kevinten10 force-pushed the fix/windows-exec-encoding-50519 branch from 6bfe46a to e37dee1 Compare March 19, 2026 22:50
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e37dee152e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +137 to +139
const cmdExeCommandLine = useCmdWrapper
? `chcp 65001 > nul && ${buildCmdExeCommandLine(execCommand, execArgs)}`
: buildCmdExeCommandLine(execCommand, execArgs);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid cmd.exe escaping when the wrapper is not used

cmdExeCommandLine is computed unconditionally here, so buildCmdExeCommandLine/escapeForCmdExe now runs even when useCmdWrapper is false. That regresses ordinary execFile calls on every platform: src/link-understanding/runner.ts:60-68 and src/media-understanding/runner.entries.ts:634-642 both pass templated URLs/prompts through runExec, and common values like ...?a=1&b=2 or 100% will now throw Unsafe Windows cmd.exe argument detected before the process is launched. The cmd-specific escaping needs to stay inside the .cmd/.bat branch only.

Useful? React with 👍 / 👎.

@kevinten10
Copy link
Copy Markdown
Contributor Author

@greptile-apps Thanks for the detailed review! I've updated the PR based on your feedback:

Changes made:

  • Replaced hardcoded 'gbk' encoding with
  • Now prepending `chcp 65001 > nul &\u0026` to force UTF-8 mode in cmd.exe
  • This approach works for all Windows locales (Chinese, Japanese, European, etc.)
  • Updated tests to verify chcp command is prepended correctly

The implementation now uses Node.js-native 'utf8' encoding with the chcp command to set the console code page, which is a standard approach used by many tools.

Could you please re-review the updated commit `e37dee1`?

…e character garbling

Fixes openclaw#50519

Windows cmd.exe defaults to GBK/CP936 encoding, but the exec tool was
hardcoding utf8 encoding. This caused Chinese characters to render as
garbled text (mojibake) when executing commands through the cmd.exe
wrapper.

Changes:
- Detect when cmd.exe wrapper is used (for .cmd/.bat files)
- Use 'gbk' encoding for cmd.exe wrapper, 'utf8' otherwise
- Handle Buffer vs string return types safely

Test plan:
- [x] Unit tests added for encoding selection
- [x] All existing tests pass
- [x] Type checking passes
- [x] Formatting passes

Co-Authored-By: Claude Code <[email protected]>
@kevinten10 kevinten10 force-pushed the fix/windows-exec-encoding-50519 branch from e37dee1 to 76ed7b7 Compare March 22, 2026 16:54
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 76ed7b7579

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +137 to +138
const cmdExeCommandLine = useCmdWrapper
? `chcp 65001 > nul && ${buildCmdExeCommandLine(execCommand, execArgs)}`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid switching arbitrary batch files to UTF-8 before launch

This helper is not limited to the ASCII pnpm.cmd shim covered by the new tests: exported callers like src/plugin-sdk/run-command.ts:16-30 can pass an actual user/plugin-provided .cmd/.bat path through runExec/runCommandWithTimeout. Prepending chcp 65001 changes the active code page before cmd.exe starts reading that batch file, so legacy wrappers saved in the machine code page (common for localized Windows scripts with non-ASCII literals or paths) are now parsed as UTF-8 and can mis-execute or fail before the real command runs. Decoding captured output after execution avoids mojibake without changing how existing batch files are interpreted.

Useful? React with 👍 / 👎.

@kevinten10
Copy link
Copy Markdown
Contributor Author

@greptile-apps I've applied the same fix to \ as well. Both functions now prepend \ when using the cmd.exe wrapper. Please re-review the latest commit (76ed7b7).

Remove unnecessary cmdExeCommandLine variable that was computed
twice - once when useCmdWrapper was true and once when false.
Now the command is only built when useCmdWrapper is true.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@easyteacher
Copy link
Copy Markdown

exec uses powershell on Windows by default so this fix likely doesn't work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Windows exec tool produces garbled Chinese characters due to hardcoded UTF-8 encoding

2 participants