-
Notifications
You must be signed in to change notification settings - Fork 9.1k
Description
Windows Terminal version (or Windows build number)
10.0.19043.1110
Other Software
No response
Steps to reproduce
Compile and run the following code:
#include <iostream>
#include <string>
#include <windows.h>
int main()
{
const auto Out = GetStdHandle(STD_OUTPUT_HANDLE);
const std::wstring Text = L"[👨👩👧👦]";
system("cls");
DWORD n;
WriteConsoleW(Out, Text.data(), Text.size(), &n, {});
CHAR_INFO Buffer[10];
const COORD BufferCoord{ 10, 1 };
SMALL_RECT ReadRegion{ 0, 0, 9, 0 };
ReadConsoleOutputW(Out, Buffer, BufferCoord, {}, &ReadRegion);
SMALL_RECT WriteRegion{ 0, 2, 9, 2 };
WriteConsoleOutputW(Out, Buffer, BufferCoord, {}, &WriteRegion);
int In;
std::cin >> In;
}Expected Behavior
Actual Behavior
Observations
The problem is here:
terminal/src/host/directio.cpp
Line 848 in 3f5f37d
| *targetIter = gci.AsCharInfo(*sourceIter); |
AsCharInfo calls Utf16ToUcs2 for the same string over and over:
terminal/src/host/consoleInformation.cpp
Line 408 in 3f5f37d
| ci.Char.UnicodeChar = Utf16ToUcs2(cell.Chars()); |
which returns UNICODE_REPLACEMENT:
terminal/src/types/convert.cpp
Line 172 in 3f5f37d
| return UNICODE_REPLACEMENT; |
Additionally, AsCharInfo incorrectly sets COMMON_LVB_LEADING_BYTE in this case:
terminal/src/host/consoleInformation.cpp
Line 415 in 3f5f37d
| ci.Attributes |= cell.DbcsAttr().GeneratePublicApiAttributeFormat(); |
I assume that the host might use Attribute::Leading and Attribute::Trailing internally for its own purposes, but COMMON_LVB_LEADING_BYTE & COMMON_LVB_TRAILING_BYTE have a different purpose, not applicable in this case and shouldn't leak into the public interfaces for surrogates.
A possible fix
diff --git a/src/host/directio.cpp b/src/host/directio.cpp
index 760a7ecf..4ed1ab0d 100644
--- a/src/host/directio.cpp
+++ b/src/host/directio.cpp
@@ -844,21 +844,34 @@ void EventsToUnicode(_Inout_ std::deque<std::unique_ptr<IInputEvent>>& inEvents,
// If the point we're trying to write is inside the limited buffer write zone...
if (targetLimit.IsInBounds(targetPos))
{
+ auto charInfo = gci.AsCharInfo(*sourceIter);
+
+ if (sourceIter->Chars().size() > 1)
+ charInfo.Attributes &= ~COMMON_LVB_LEADING_BYTE;
+
// Copy the data into position...
- *targetIter = gci.AsCharInfo(*sourceIter);
- // ... and advance the read iterator.
- sourceIter++;
- }
+ for (const auto i: sourceIter->Chars())
+ {
+ targetIter->Attributes = charInfo.Attributes;
+ targetIter->Char.UnicodeChar = i;
- // Always advance the write iterator, we might have skipped it due to clipping.
- targetIter++;
+ // Always advance the write iterator, we might have skipped it due to clipping.
+ ++targetIter;
- // Increment the target
- targetPos.X++;
- if (targetPos.X >= targetSize.X)
- {
- targetPos.X = 0;
- targetPos.Y++;
+ // Increment the target
+ ++targetPos.X;
+ if (targetPos.X >= targetSize.X)
+ {
+ targetPos.X = 0;
+ ++targetPos.Y;
+ }
+
+ // ... and advance the read iterator.
+ sourceIter++;
+
+ if (!targetLimit.IsInBounds(targetPos))
+ break;
+ }
}
}P.S. I remember that there's #8000 that should improve the situation with surrogates in general. This one is about the API issue: when a surrogate pair occupies two columns, there's really no reason to return two replacement characters if we could return the original pair. The situation with a pair that occupies one column is less obvious, but that's a different issue.

