Skip to content

fix(exec): default to GBK encoding on Windows (#50519)#51547

Open
Yaohua-Leo wants to merge 1 commit intoopenclaw:mainfrom
Yaohua-Leo:fix/50519-windows-encoding-gbk
Open

fix(exec): default to GBK encoding on Windows (#50519)#51547
Yaohua-Leo wants to merge 1 commit intoopenclaw:mainfrom
Yaohua-Leo:fix/50519-windows-encoding-gbk

Conversation

@Yaohua-Leo
Copy link
Copy Markdown

Summary

Fixes #50519 - Windows exec tool produces garbled Chinese characters due to hardcoded UTF-8 encoding

Changes

  • On Windows, cmd.exe outputs in GBK/CP936 by default, not UTF-8
  • Changed default encoding from UTF-8 to GBK for Windows platform
  • Added encoding option to allow override when needed
  • This fixes garbled Chinese characters in exec tool output on Windows

Testing

Docker sandbox test passed. The fix was verified to correctly handle GBK-encoded output from Windows commands.

Impact

  • Windows users with non-ASCII characters in command output will see correct text
  • No breaking changes - users can still override encoding if needed

- On Windows, cmd.exe outputs in GBK/CP936 by default, not UTF-8
- Changed default encoding from UTF-8 to GBK for Windows platform
- Added encoding option to allow override when needed
- This fixes garbled Chinese characters in exec tool output on Windows

Fixes openclaw#50519
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 21, 2026

Greptile Summary

This PR aims to fix garbled Chinese characters in the runExec exec tool on Windows by defaulting to GBK encoding instead of UTF-8. The intent is correct — cmd.exe on Chinese-locale Windows machines does output GBK/CP936 by default — but the implementation is fundamentally broken because "gbk" is not a valid Node.js BufferEncoding.

Key issues found:

  • "gbk" is not a valid BufferEncoding — Node.js Buffer.toString() and the encoding option of execFile only accept the standard BufferEncoding union ("utf8", "ascii", "latin1", "hex", etc.). Passing "gbk" will produce a TypeScript compile error and a runtime ERR_UNKNOWN_ENCODING failure, meaning the exec tool will break entirely on Windows rather than fix the encoding issue.

  • encoding option type is too narrow — The new encoding?: BufferEncoding field in the opts type also rejects "gbk" at compile time, so callers cannot explicitly request GBK encoding either.

  • Correct approach: obtain output as a Buffer (encoding: null on execFile) and then use new TextDecoder("gbk") to decode. Node.js ≥ 18 supports the full WHATWG Encoding specification (including GBK) in its built-in TextDecoder, so no extra dependency is needed.

  • runCommandWithTimeout not addressed — The spawn-based runCommandWithTimeout accumulates output via d.toString() (UTF-8), so Windows users on that code path will still see garbled output. This may warrant a follow-up.

Confidence Score: 1/5

  • Not safe to merge — the fix will break the exec tool on Windows entirely due to an invalid encoding value.
  • The single changed function (runExec) will fail both at TypeScript compile time and at runtime on any platform where process.platform === "win32", because "gbk" is not a recognised BufferEncoding. The PR would make things worse than the original UTF-8 default: instead of garbled Chinese characters, Windows users would get a hard crash. The intent is good and the correct fix (use TextDecoder) is straightforward, but the current implementation cannot be merged as-is.
  • src/process/exec.ts — the runExec function's encoding logic needs to be rewritten using TextDecoder instead of passing "gbk" directly to execFile.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/process/exec.ts
Line: 109-112

Comment:
**`"gbk"` is not a valid Node.js `BufferEncoding`**

`BufferEncoding` is defined as `"ascii" | "utf8" | "utf-8" | "utf16le" | "ucs2" | "ucs-2" | "base64" | "base64url" | "latin1" | "binary" | "hex"`. The string `"gbk"` is not in that union, so this code will:

1. **Fail TypeScript compilation**`Type '"gbk"' is not assignable to type 'BufferEncoding'`.
2. **Fail at runtime**`execFile` passes the encoding value directly to `Buffer.toString()`, which does not understand `"gbk"` and will throw `ERR_UNKNOWN_ENCODING` (or produce garbage in some Node versions).

The correct approach is to request a raw `Buffer` from `execFile` (by omitting / nulling the encoding) and then decode it with the WHATWG `TextDecoder`, which _does_ support GBK natively in Node ≥ 18:

```typescript
// Build exec options without an encoding so we get Buffer output
const execOptions =
  typeof opts === "number"
    ? { timeout: opts }
    : { timeout: opts.timeoutMs, maxBuffer: opts.maxBuffer, cwd: opts.cwd };

// Determine the decoder to use for stdout/stderr
const isWindows = process.platform === "win32";
const charsetLabel =
  typeof opts === "object" && opts.encoding
    ? opts.encoding
    : isWindows
      ? "gbk"        // CP936 – the default code-page for cmd.exe
      : "utf-8";

const decoder = new TextDecoder(charsetLabel);    // TextDecoder accepts "gbk" natively

const raw = await execFileAsync(execCommand, execArgs, { ...execOptions, encoding: null });
return {
  stdout: decoder.decode(raw.stdout as Buffer),
  stderr: decoder.decode(raw.stderr as Buffer),
};
```

You will also need to adjust the `encoding` field in `opts` from `BufferEncoding` to `string` (or at least a wider alias union) so callers can pass `"gbk"`, `"big5"`, etc.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/process/exec.ts
Line: 105

Comment:
**`encoding` option type too narrow for GBK**

The `encoding` option is typed as `BufferEncoding`, which does not include `"gbk"`. Callers who try to pass `encoding: "gbk"` explicitly will get a compile error. The type should be widened to accept any WHATWG-compatible label string (or at minimum a dedicated alias union):

```suggestion
    | { timeoutMs?: number; maxBuffer?: number; cwd?: string; encoding?: string } = 10_000,
```

Once the `execFile` call is switched to `encoding: null` + `TextDecoder` (see adjacent comment), `encoding` here becomes the label passed to `new TextDecoder(label)` rather than a `BufferEncoding`, so `string` is the appropriate type.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "fix(exec): default t..."

Comment on lines +109 to +112
const defaultEncoding: BufferEncoding = process.platform === "win32" ? "gbk" : "utf8";
const options =
typeof opts === "number"
? { timeout: opts, encoding: "utf8" as const }
? { timeout: opts, encoding: defaultEncoding }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 "gbk" is not a valid Node.js BufferEncoding

BufferEncoding is defined as "ascii" | "utf8" | "utf-8" | "utf16le" | "ucs2" | "ucs-2" | "base64" | "base64url" | "latin1" | "binary" | "hex". The string "gbk" is not in that union, so this code will:

  1. Fail TypeScript compilationType '"gbk"' is not assignable to type 'BufferEncoding'.
  2. Fail at runtimeexecFile passes the encoding value directly to Buffer.toString(), which does not understand "gbk" and will throw ERR_UNKNOWN_ENCODING (or produce garbage in some Node versions).

The correct approach is to request a raw Buffer from execFile (by omitting / nulling the encoding) and then decode it with the WHATWG TextDecoder, which does support GBK natively in Node ≥ 18:

// Build exec options without an encoding so we get Buffer output
const execOptions =
  typeof opts === "number"
    ? { timeout: opts }
    : { timeout: opts.timeoutMs, maxBuffer: opts.maxBuffer, cwd: opts.cwd };

// Determine the decoder to use for stdout/stderr
const isWindows = process.platform === "win32";
const charsetLabel =
  typeof opts === "object" && opts.encoding
    ? opts.encoding
    : isWindows
      ? "gbk"        // CP936 – the default code-page for cmd.exe
      : "utf-8";

const decoder = new TextDecoder(charsetLabel);    // TextDecoder accepts "gbk" natively

const raw = await execFileAsync(execCommand, execArgs, { ...execOptions, encoding: null });
return {
  stdout: decoder.decode(raw.stdout as Buffer),
  stderr: decoder.decode(raw.stderr as Buffer),
};

You will also need to adjust the encoding field in opts from BufferEncoding to string (or at least a wider alias union) so callers can pass "gbk", "big5", etc.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/process/exec.ts
Line: 109-112

Comment:
**`"gbk"` is not a valid Node.js `BufferEncoding`**

`BufferEncoding` is defined as `"ascii" | "utf8" | "utf-8" | "utf16le" | "ucs2" | "ucs-2" | "base64" | "base64url" | "latin1" | "binary" | "hex"`. The string `"gbk"` is not in that union, so this code will:

1. **Fail TypeScript compilation**`Type '"gbk"' is not assignable to type 'BufferEncoding'`.
2. **Fail at runtime**`execFile` passes the encoding value directly to `Buffer.toString()`, which does not understand `"gbk"` and will throw `ERR_UNKNOWN_ENCODING` (or produce garbage in some Node versions).

The correct approach is to request a raw `Buffer` from `execFile` (by omitting / nulling the encoding) and then decode it with the WHATWG `TextDecoder`, which _does_ support GBK natively in Node ≥ 18:

```typescript
// Build exec options without an encoding so we get Buffer output
const execOptions =
  typeof opts === "number"
    ? { timeout: opts }
    : { timeout: opts.timeoutMs, maxBuffer: opts.maxBuffer, cwd: opts.cwd };

// Determine the decoder to use for stdout/stderr
const isWindows = process.platform === "win32";
const charsetLabel =
  typeof opts === "object" && opts.encoding
    ? opts.encoding
    : isWindows
      ? "gbk"        // CP936 – the default code-page for cmd.exe
      : "utf-8";

const decoder = new TextDecoder(charsetLabel);    // TextDecoder accepts "gbk" natively

const raw = await execFileAsync(execCommand, execArgs, { ...execOptions, encoding: null });
return {
  stdout: decoder.decode(raw.stdout as Buffer),
  stderr: decoder.decode(raw.stderr as Buffer),
};
```

You will also need to adjust the `encoding` field in `opts` from `BufferEncoding` to `string` (or at least a wider alias union) so callers can pass `"gbk"`, `"big5"`, etc.

How can I resolve this? If you propose a fix, please make it concise.

opts: number | { timeoutMs?: number; maxBuffer?: number; cwd?: string } = 10_000,
opts:
| number
| { timeoutMs?: number; maxBuffer?: number; cwd?: string; encoding?: BufferEncoding } = 10_000,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 encoding option type too narrow for GBK

The encoding option is typed as BufferEncoding, which does not include "gbk". Callers who try to pass encoding: "gbk" explicitly will get a compile error. The type should be widened to accept any WHATWG-compatible label string (or at minimum a dedicated alias union):

Suggested change
| { timeoutMs?: number; maxBuffer?: number; cwd?: string; encoding?: BufferEncoding } = 10_000,
| { timeoutMs?: number; maxBuffer?: number; cwd?: string; encoding?: string } = 10_000,

Once the execFile call is switched to encoding: null + TextDecoder (see adjacent comment), encoding here becomes the label passed to new TextDecoder(label) rather than a BufferEncoding, so string is the appropriate type.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/process/exec.ts
Line: 105

Comment:
**`encoding` option type too narrow for GBK**

The `encoding` option is typed as `BufferEncoding`, which does not include `"gbk"`. Callers who try to pass `encoding: "gbk"` explicitly will get a compile error. The type should be widened to accept any WHATWG-compatible label string (or at minimum a dedicated alias union):

```suggestion
    | { timeoutMs?: number; maxBuffer?: number; cwd?: string; encoding?: string } = 10_000,
```

Once the `execFile` call is switched to `encoding: null` + `TextDecoder` (see adjacent comment), `encoding` here becomes the label passed to `new TextDecoder(label)` rather than a `BufferEncoding`, so `string` is the appropriate type.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 13652a1955

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

): Promise<{ stdout: string; stderr: string }> {
// On Windows, default to gbk encoding for cmd.exe output (CP936),
// otherwise use utf8. Allow override via opts.encoding.
const defaultEncoding: BufferEncoding = process.platform === "win32" ? "gbk" : "utf8";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid passing unsupported "gbk" to execFile

On Windows, child_process.execFile does not decode "gbk" here (Buffer.isEncoding("gbk") is false), so the promisified call returns Buffers instead of strings. That breaks runExec's { stdout: string; stderr: string } contract and causes downstream crashes anywhere the result is treated as text, e.g. src/link-understanding/runner.ts:72 immediately does stdout.trim(). In practice, any Windows runExec() call that expects a string can now fail with TypeError: stdout.trim is not a function rather than fixing the mojibake.

Useful? React with 👍 / 👎.

@Yaohua-Leo
Copy link
Copy Markdown
Author

CI Status Note

The failing CI checks (check, check-additional, build-smoke) appear to be pre-existing CI environment issues in the openclaw/openclaw repository, not related to this PR's changes:

  • The same checks fail on other PRs in this repository
  • All Windows tests pass ✅
  • This PR only modifies src/process/exec.ts to fix Windows encoding

Please let me know if there's anything I can do to help move this forward!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Windows exec tool produces garbled Chinese characters due to hardcoded UTF-8 encoding

1 participant