Introduce breaking changes to ReadConsoleOutput#13321
Conversation
714dbf9 to
f1eb06a
Compare
d74ced1 to
444d44f
Compare
zadjii-msft
left a comment
There was a problem hiding this comment.
So, from what I remember about the discussion in #13339, the tl;dr was "this never worked right, so changing what it does in the broken case shouldn't make anyone more broke", then
:ship_it:
| // If .AsciiChar and .UnicodeChar have the same offset (since they're a union), | ||
| // we can just write the latter with a byte-sized value to set the former | ||
| // _and_ simultaneously clear the upper byte of .UnicodeChar to 0. Nice! | ||
| static_assert(offsetof(CHAR_INFO, Char.AsciiChar) == offsetof(CHAR_INFO, Char.UnicodeChar)); |
There was a problem hiding this comment.
This might just be me being obtuse, but it seems weird to me that this static_assert is inside both loops, and not like, at the top of the method, or just at the top of the file or something.
There was a problem hiding this comment.
The other assertion is in the DBCS test file unfortunately. I felt like keeping those assertions close to the code is better since there are alternative ways to write this code (for instance assigning UnicodeChar and AsciiChar independently).
|
@msftbot do not merge unless @DHowett approves |
|
Hello @lhecker! Because you've given me some instructions on how to help merge this pull request, I'll be modifying my merge approach. Here's how I understand your requirements for merging this pull request:
If this doesn't seem right to you, you can tell me to cancel these instructions and use the auto-merge policy that has been configured for this repository. Try telling me "forget everything I just told you". |
|
🎉 Handy links: |
#13626 contains a small "regression" compared to #13321: It now began to store trailers in the buffer wherever possible to allow a region of the buffer to be backed up and restored via Read/WriteConsoleOutput. But we're unfortunately still ill-equipped to handle anything but UCS-2 via WriteConsoleOutput, so it's best to again ignore trailers just like in #13321. ## Validation Steps Performed * Added unit test ✅
#13626 contains a small "regression" compared to #13321: It now began to store trailers in the buffer wherever possible to allow a region of the buffer to be backed up and restored via Read/WriteConsoleOutput. But we're unfortunately still ill-equipped to handle anything but UCS-2 via WriteConsoleOutput, so it's best to again ignore trailers just like in #13321. ## Validation Steps Performed * Added unit test ✅ (cherry picked from commit 9dcdcac) Service-Card-Id: 88719297 Service-Version: 1.17
#8000 will change the way we store text from a strict grid/matrix where
one UTF16 character or surrogate pair always equals 1 column with the
possibility of joining exactly 2 to a wide character pair, to a dynamic
buffer where 1 or more characters can form 1 or more columns in any
arbitrary combination. Our long term goal is to properly support both
complex grapheme clusters like Emojis and complex ligatures that a wider
than 2 columns. This change requires us to break our API as
ReadConsoleOutputA/Wassumes the existence of exactly this grid/matrixstorage. Since we store wide characters like "い" as a single codepoint
that is simply marked as being 2 columns wide in the future, we cannot
reconstruct trailing DBCS characters that were written to the buffer
like we used to. On the other hand this new behavior allows us to
implement better Unicode support and most likely significantly improve
our performance.
Minor breaking changes
ReadConsoleOutputAwill now always zero the high byte in(CHAR_INFO).Char.UnicodeChar. Only the.AsciiCharcan be usedthen. This prevents users from storing "additional" data in the
terminal buffer.
ReadConsoleOutputAwill now zero the.AsciiCharif it fails toconvert the Unicode character into an appropriate DBCS.
being a wide character. In these cases
WriteConsoleOutputAwillnow return
0x00instead of0x44(the lower half of い's codepoint
0x3044).Major breaking changes
ReadConsoleOutputWwill now repeat the leading Unicode charactertwice and ignore the trailing one.
0x3044 0xabcdwithWriteConsoleOutputWused to yield the same0x3044 0xabcdif readback with
ReadConsoleOutputW. This worked because conhosteffectively ignored the trailing codepoint, allowing one to
"smuggle" data. In the future this trailing character will be
discarded and produce
0x3044 0x3044instead.WriteConsoleOutputAcan be done withcode page 932 (Shift-JIS) and the DBCS
0x82 0xa2. If read backwith
ReadConsoleOutputWthis would previously yield the twoUnicode characters
0x3044 0xffff. After this commit it'll yield0x3044 0x3044.Alternative approaches
It's possible to "tag"/"mark" written data as originating from
WriteConsoleOutputA/Wso that it can be reconstructed accurately lateron. However this lead to implementation complexities that we're actively
trying to avoid in the new buffer implementation. Effectively
everything that touches the buffer's text would have to handle these
marks and either write or clear them. Given the most likely small amount
of users who depend on the current quirky behavior, it'd be an
unwarranted maintenance and performance burden and prevent Windows
Terminal to ever truly migrate to full Unicode support.
Validation Steps Performed