-
Notifications
You must be signed in to change notification settings - Fork 9.1k
Description
Environment
Microsoft Windows [Version 10.0.18363.657]
conhost.exe builtin console, V2
wt.exe terminal, V0.9.433.0
Steps to reproduce
Extract, compile and run the attached readsp.c program under the V2 console. This programs exercises directly writing a non-BMP character to the input buffer via WriteConsoleInputW and reading it back via ReadConsoleW, first with echo enabled and then with it disabled. Run the program with -v (e.g. readsp -v) to show the input key-event records that each step tries to read. It tries a normal key down/up event pair as well as the Alt+Numpad sequence that the console uses for pasted text. The latter uses 6 key events per wide-character and thus 12 key events for a surrogate pair. I included the paste sequence to try to clarify a related issue in which manually pasting a non-BMP character produces a different incorrect result, but it didn't help. I'll discuss that related issue in a comment, in case it's all due to the same underlying issue.
Expected behavior
ReadConsoleW should be able to correctly read supplementary-plane (i.e. non-BMP) characters such as "😞" (U+1F61E), regardless of whether they are typed or pasted into the terminal window, or written directly to the input buffer, or whether echo is enabled. Since the wide-character API uses 16-bit characters, the non-BMP character should be read as a UTF-16 surrogate pair, e.g. U+1F61E should be encoded as {0xD83D, 0xDE1E}.
ReadConsoleW works as expected with the legacy (V1) console. For example:
Test normal with ECHO ON
😞
stream (4): L"\ud83d\ude1e\u000d\u000a"
screen: L"\ud83d\ude1e "
Test paste with ECHO ON
😞
stream (4): L"\ud83d\ude1e\u000d\u000a"
screen: L"\ud83d\ude1e "
Test normal with ECHO OFF
stream (4): L"\ud83d\ude1e\u000d\u000a"
Test paste with ECHO OFF
stream (4): L"\ud83d\ude1e\u000d\u000a"
It almost works correctly with Windows Terminal version 0.9.433.0:
Test normal with ECHO ON
��
stream (4) = L"\ud83d\ude1e\u000d\u000a"
screen = L"\ufffd\ufffd "
Test paste with ECHO ON
��
stream (4) = L"\ud83d\ude1e\u000d\u000a"
screen = L"\ufffd\ufffd "
Test normal with ECHO OFF
stream (4) = L"\ud83d\ude1e\u000d\u000a"
Test paste with ECHO OFF
stream (4) = L"\ud83d\ude1e\u000d\u000a"
Apparently a cooked read under Windows Terminal has a bug in which a non-BMP character gets echoed as two replacement characters, U+FFFD. But at least the ReadConsoleW result is correct.
Actual behavior
In the output below, not only does the cooked read fail with ERROR_INVALID_PARAMETER (87) when echo is enabled, but the echoed text contains only the first surrogate code of the surrogate pair, 0xD83D.
Test normal with ECHO ON
�
ReadConsoleW failed (87)
screen: L"\ud83d "
Test paste with ECHO ON
�
ReadConsoleW failed (87)
screen: L"\ud83d "
Test normal with ECHO OFF
stream (4): L"\ud83d\ude1e\u000d\u000a"
Test paste with ECHO OFF
stream (4): L"\ud83d\ude1e\u000d\u000a"
Since it's not a valid Unicode character, I've replaced this lone surrogate code in the pasted text with the Unicode replacement character, U+FFFD, but the "screen" text, which gets read directly from the screen buffer, shows that the code displayed on the console is 0xD83D.