fix: comprehensive Windows and CJK support for MCP server#512
fix: comprehensive Windows and CJK support for MCP server#512JerryLiu720 wants to merge 1 commit intoMemPalace:developfrom
Conversation
web3guru888
left a comment
There was a problem hiding this comment.
Solid fix. The three-part approach is the right one:
stdin surrogateescape / stdout replace: This is the correct pairing. surrogateescape on input is tolerant — it lets you ingest bytes that aren't valid UTF-8 without crashing, and you can propagate or discard them later. replace on output is safe — it won't produce invalid JSON (lone surrogates in a JSON string will break parsers on the other end). The hasattr guard for non-buffered environments (tests, IDEs) is also good defensive practice.
ensure_ascii=False in both serialization points: Important to get both — if you fix one and miss the other you get inconsistent behavior. Catching both handle_request (tool result) and the main loop response write was the right call.
One minor note: in the MCP protocol, the JSON-RPC layer is supposed to be clean UTF-8. If errors='replace' substitutes a \uFFFD character somewhere inside a content string, that's technically correct from an encoding standpoint, but clients may not expect replacement characters in tool outputs. It might be worth a brief note in the PR description (or a comment in the code) explaining that replace is the fallback for unrecoverable surrogates and that clean UTF-8 content will pass through unmodified.
But that's a documentation concern, not a correctness issue. The behavior is right. This closes #503 cleanly.
|
@web3guru888 Thank you for the detailed review! Good point about the replace documentation. The code comment already explains that replace prevents lone surrogates from breaking JSON-RPC client parsers, and clean UTF-8 content passes through unmodified - only unrecoverable surrogates get replaced with �. Appreciate the confirmation that the approach is correct and that this closes #503 cleanly! |
|
Fair point — I missed that the code comment already explains the rationale. The logic is sound: |
b32b506 to
f619ae9
Compare
9696911 to
6aee219
Compare
This PR addresses Windows encoding issues and improves non-ASCII character handling throughout the MCP server. **Changes:** 1. **stdin/stdout UTF-8 enforcement (MemPalace#503)** - Windows defaults to cp936/GBK, causing crashes with CJK content - Wrapped stdin with 'surrogateescape' error handler (tolerant input) - Wrapped stdout with 'replace' error handler (safe output) - Added hasattr() checks for edge cases (IDE/test environments) 2. **MCP tool result serialization** - Added ensure_ascii=False to tool result json.dumps() (line 907) - Preserves Unicode characters in tool responses 3. **JSON-RPC response serialization (MemPalace#359)** - Added ensure_ascii=False to main loop json.dumps() (line 949) - Prevents Chinese characters from being escaped to \uXXXX **Technical Details:** - stdin uses 'surrogateescape': handles malformed input gracefully - stdout uses 'replace': prevents lone surrogates in JSON-RPC responses (per recommendation from MemPalace#503 discussion to avoid breaking client parsers) - All json.dumps() calls now preserve Unicode with ensure_ascii=False Fixes MemPalace#503 Related to MemPalace#359 (already has PR MemPalace#425 with tests) Made-with: Cursor
6aee219 to
4ecc085
Compare
Summary
Comprehensive fix for Windows encoding issues and improved non-ASCII character handling in the MCP server.
Problem
\uXXXXformat, making output hard to readSolution
This PR addresses three critical areas in
mcp_server.py:1. stdin/stdout UTF-8 enforcement (lines 925-947)
errors='surrogateescape'for tolerant input handlingerrors='replace'for safe output (prevents lone surrogates in JSON-RPC)hasattr()checks for edge cases (IDE/test environments)2. MCP tool result serialization (line 907)
ensure_ascii=Falseto preserve Unicode in tool responses3. JSON-RPC response serialization (line 961)
ensure_ascii=Falseto prevent character escapingRelationship to PR #400
This PR complements #400 but addresses different aspects:
PR #400:
reconfigure()witherrors="strict"This PR:
TextIOWrapperwith differentiated error handlerssurrogateescape(tolerant input, per Bug: MCP server fails with surrogate error on Windows when writing CJK content #503 discussion recommendations)replace(safe output, prevents JSON-RPC client crashes)ensure_ascii=False)Both approaches are valid, but this PR follows the error handling strategy recommended by experts in the #503 discussion thread.
Technical Details
surrogateescapeto handle malformed input gracefullyreplaceto prevent lone surrogates from breaking JSON-RPC client parsersjson.dumps()calls now preserve Unicode charactersTesting
python3 -m py_compileImpact
\uXXXXescapes)Related Issues
Checklist