fix(exec): default to GBK encoding on Windows (#50519)#51547
fix(exec): default to GBK encoding on Windows (#50519)#51547Yaohua-Leo wants to merge 1 commit intoopenclaw:mainfrom
Conversation
- On Windows, cmd.exe outputs in GBK/CP936 by default, not UTF-8 - Changed default encoding from UTF-8 to GBK for Windows platform - Added encoding option to allow override when needed - This fixes garbled Chinese characters in exec tool output on Windows Fixes openclaw#50519
Greptile SummaryThis PR aims to fix garbled Chinese characters in the Key issues found:
Confidence Score: 1/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/process/exec.ts
Line: 109-112
Comment:
**`"gbk"` is not a valid Node.js `BufferEncoding`**
`BufferEncoding` is defined as `"ascii" | "utf8" | "utf-8" | "utf16le" | "ucs2" | "ucs-2" | "base64" | "base64url" | "latin1" | "binary" | "hex"`. The string `"gbk"` is not in that union, so this code will:
1. **Fail TypeScript compilation** — `Type '"gbk"' is not assignable to type 'BufferEncoding'`.
2. **Fail at runtime** — `execFile` passes the encoding value directly to `Buffer.toString()`, which does not understand `"gbk"` and will throw `ERR_UNKNOWN_ENCODING` (or produce garbage in some Node versions).
The correct approach is to request a raw `Buffer` from `execFile` (by omitting / nulling the encoding) and then decode it with the WHATWG `TextDecoder`, which _does_ support GBK natively in Node ≥ 18:
```typescript
// Build exec options without an encoding so we get Buffer output
const execOptions =
typeof opts === "number"
? { timeout: opts }
: { timeout: opts.timeoutMs, maxBuffer: opts.maxBuffer, cwd: opts.cwd };
// Determine the decoder to use for stdout/stderr
const isWindows = process.platform === "win32";
const charsetLabel =
typeof opts === "object" && opts.encoding
? opts.encoding
: isWindows
? "gbk" // CP936 – the default code-page for cmd.exe
: "utf-8";
const decoder = new TextDecoder(charsetLabel); // TextDecoder accepts "gbk" natively
const raw = await execFileAsync(execCommand, execArgs, { ...execOptions, encoding: null });
return {
stdout: decoder.decode(raw.stdout as Buffer),
stderr: decoder.decode(raw.stderr as Buffer),
};
```
You will also need to adjust the `encoding` field in `opts` from `BufferEncoding` to `string` (or at least a wider alias union) so callers can pass `"gbk"`, `"big5"`, etc.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/process/exec.ts
Line: 105
Comment:
**`encoding` option type too narrow for GBK**
The `encoding` option is typed as `BufferEncoding`, which does not include `"gbk"`. Callers who try to pass `encoding: "gbk"` explicitly will get a compile error. The type should be widened to accept any WHATWG-compatible label string (or at minimum a dedicated alias union):
```suggestion
| { timeoutMs?: number; maxBuffer?: number; cwd?: string; encoding?: string } = 10_000,
```
Once the `execFile` call is switched to `encoding: null` + `TextDecoder` (see adjacent comment), `encoding` here becomes the label passed to `new TextDecoder(label)` rather than a `BufferEncoding`, so `string` is the appropriate type.
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: "fix(exec): default t..." |
| const defaultEncoding: BufferEncoding = process.platform === "win32" ? "gbk" : "utf8"; | ||
| const options = | ||
| typeof opts === "number" | ||
| ? { timeout: opts, encoding: "utf8" as const } | ||
| ? { timeout: opts, encoding: defaultEncoding } |
There was a problem hiding this comment.
"gbk" is not a valid Node.js BufferEncoding
BufferEncoding is defined as "ascii" | "utf8" | "utf-8" | "utf16le" | "ucs2" | "ucs-2" | "base64" | "base64url" | "latin1" | "binary" | "hex". The string "gbk" is not in that union, so this code will:
- Fail TypeScript compilation —
Type '"gbk"' is not assignable to type 'BufferEncoding'. - Fail at runtime —
execFilepasses the encoding value directly toBuffer.toString(), which does not understand"gbk"and will throwERR_UNKNOWN_ENCODING(or produce garbage in some Node versions).
The correct approach is to request a raw Buffer from execFile (by omitting / nulling the encoding) and then decode it with the WHATWG TextDecoder, which does support GBK natively in Node ≥ 18:
// Build exec options without an encoding so we get Buffer output
const execOptions =
typeof opts === "number"
? { timeout: opts }
: { timeout: opts.timeoutMs, maxBuffer: opts.maxBuffer, cwd: opts.cwd };
// Determine the decoder to use for stdout/stderr
const isWindows = process.platform === "win32";
const charsetLabel =
typeof opts === "object" && opts.encoding
? opts.encoding
: isWindows
? "gbk" // CP936 – the default code-page for cmd.exe
: "utf-8";
const decoder = new TextDecoder(charsetLabel); // TextDecoder accepts "gbk" natively
const raw = await execFileAsync(execCommand, execArgs, { ...execOptions, encoding: null });
return {
stdout: decoder.decode(raw.stdout as Buffer),
stderr: decoder.decode(raw.stderr as Buffer),
};You will also need to adjust the encoding field in opts from BufferEncoding to string (or at least a wider alias union) so callers can pass "gbk", "big5", etc.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/process/exec.ts
Line: 109-112
Comment:
**`"gbk"` is not a valid Node.js `BufferEncoding`**
`BufferEncoding` is defined as `"ascii" | "utf8" | "utf-8" | "utf16le" | "ucs2" | "ucs-2" | "base64" | "base64url" | "latin1" | "binary" | "hex"`. The string `"gbk"` is not in that union, so this code will:
1. **Fail TypeScript compilation** — `Type '"gbk"' is not assignable to type 'BufferEncoding'`.
2. **Fail at runtime** — `execFile` passes the encoding value directly to `Buffer.toString()`, which does not understand `"gbk"` and will throw `ERR_UNKNOWN_ENCODING` (or produce garbage in some Node versions).
The correct approach is to request a raw `Buffer` from `execFile` (by omitting / nulling the encoding) and then decode it with the WHATWG `TextDecoder`, which _does_ support GBK natively in Node ≥ 18:
```typescript
// Build exec options without an encoding so we get Buffer output
const execOptions =
typeof opts === "number"
? { timeout: opts }
: { timeout: opts.timeoutMs, maxBuffer: opts.maxBuffer, cwd: opts.cwd };
// Determine the decoder to use for stdout/stderr
const isWindows = process.platform === "win32";
const charsetLabel =
typeof opts === "object" && opts.encoding
? opts.encoding
: isWindows
? "gbk" // CP936 – the default code-page for cmd.exe
: "utf-8";
const decoder = new TextDecoder(charsetLabel); // TextDecoder accepts "gbk" natively
const raw = await execFileAsync(execCommand, execArgs, { ...execOptions, encoding: null });
return {
stdout: decoder.decode(raw.stdout as Buffer),
stderr: decoder.decode(raw.stderr as Buffer),
};
```
You will also need to adjust the `encoding` field in `opts` from `BufferEncoding` to `string` (or at least a wider alias union) so callers can pass `"gbk"`, `"big5"`, etc.
How can I resolve this? If you propose a fix, please make it concise.| opts: number | { timeoutMs?: number; maxBuffer?: number; cwd?: string } = 10_000, | ||
| opts: | ||
| | number | ||
| | { timeoutMs?: number; maxBuffer?: number; cwd?: string; encoding?: BufferEncoding } = 10_000, |
There was a problem hiding this comment.
encoding option type too narrow for GBK
The encoding option is typed as BufferEncoding, which does not include "gbk". Callers who try to pass encoding: "gbk" explicitly will get a compile error. The type should be widened to accept any WHATWG-compatible label string (or at minimum a dedicated alias union):
| | { timeoutMs?: number; maxBuffer?: number; cwd?: string; encoding?: BufferEncoding } = 10_000, | |
| | { timeoutMs?: number; maxBuffer?: number; cwd?: string; encoding?: string } = 10_000, |
Once the execFile call is switched to encoding: null + TextDecoder (see adjacent comment), encoding here becomes the label passed to new TextDecoder(label) rather than a BufferEncoding, so string is the appropriate type.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/process/exec.ts
Line: 105
Comment:
**`encoding` option type too narrow for GBK**
The `encoding` option is typed as `BufferEncoding`, which does not include `"gbk"`. Callers who try to pass `encoding: "gbk"` explicitly will get a compile error. The type should be widened to accept any WHATWG-compatible label string (or at minimum a dedicated alias union):
```suggestion
| { timeoutMs?: number; maxBuffer?: number; cwd?: string; encoding?: string } = 10_000,
```
Once the `execFile` call is switched to `encoding: null` + `TextDecoder` (see adjacent comment), `encoding` here becomes the label passed to `new TextDecoder(label)` rather than a `BufferEncoding`, so `string` is the appropriate type.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 13652a1955
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ): Promise<{ stdout: string; stderr: string }> { | ||
| // On Windows, default to gbk encoding for cmd.exe output (CP936), | ||
| // otherwise use utf8. Allow override via opts.encoding. | ||
| const defaultEncoding: BufferEncoding = process.platform === "win32" ? "gbk" : "utf8"; |
There was a problem hiding this comment.
Avoid passing unsupported "gbk" to execFile
On Windows, child_process.execFile does not decode "gbk" here (Buffer.isEncoding("gbk") is false), so the promisified call returns Buffers instead of strings. That breaks runExec's { stdout: string; stderr: string } contract and causes downstream crashes anywhere the result is treated as text, e.g. src/link-understanding/runner.ts:72 immediately does stdout.trim(). In practice, any Windows runExec() call that expects a string can now fail with TypeError: stdout.trim is not a function rather than fixing the mojibake.
Useful? React with 👍 / 👎.
CI Status NoteThe failing CI checks (check, check-additional, build-smoke) appear to be pre-existing CI environment issues in the openclaw/openclaw repository, not related to this PR's changes:
Please let me know if there's anything I can do to help move this forward! |
Summary
Fixes #50519 - Windows exec tool produces garbled Chinese characters due to hardcoded UTF-8 encoding
Changes
Testing
Docker sandbox test passed. The fix was verified to correctly handle GBK-encoded output from Windows commands.
Impact