Skip to content

ReadConsoleOutputW doesn't work with surrogate pairs (the buffer layers thread) #10810

@alabuzhev

Description

@alabuzhev

Windows Terminal version (or Windows build number)

10.0.19043.1110

Other Software

No response

Steps to reproduce

Compile and run the following code:

#include <iostream>

#include <string>
#include <windows.h>

int main()
{
	const auto Out = GetStdHandle(STD_OUTPUT_HANDLE);
	const std::wstring Text = L"[👨👩👧👦]";

	system("cls");

	DWORD n;
	WriteConsoleW(Out, Text.data(), Text.size(), &n, {});
	CHAR_INFO Buffer[10];

	const COORD BufferCoord{ 10, 1 };

	SMALL_RECT ReadRegion{ 0, 0, 9, 0 };
	ReadConsoleOutputW(Out, Buffer, BufferCoord, {}, &ReadRegion);

	SMALL_RECT WriteRegion{ 0, 2, 9, 2 };

	WriteConsoleOutputW(Out, Buffer, BufferCoord, {}, &WriteRegion);

	int In;
	std::cin >> In;
}

Expected Behavior

image

Actual Behavior

image

Observations

The problem is here:

*targetIter = gci.AsCharInfo(*sourceIter);

AsCharInfo calls Utf16ToUcs2 for the same string over and over:

ci.Char.UnicodeChar = Utf16ToUcs2(cell.Chars());

which returns UNICODE_REPLACEMENT:

return UNICODE_REPLACEMENT;

Additionally, AsCharInfo incorrectly sets COMMON_LVB_LEADING_BYTE in this case:

ci.Attributes |= cell.DbcsAttr().GeneratePublicApiAttributeFormat();

I assume that the host might use Attribute::Leading and Attribute::Trailing internally for its own purposes, but COMMON_LVB_LEADING_BYTE & COMMON_LVB_TRAILING_BYTE have a different purpose, not applicable in this case and shouldn't leak into the public interfaces for surrogates.

A possible fix

diff --git a/src/host/directio.cpp b/src/host/directio.cpp
index 760a7ecf..4ed1ab0d 100644
--- a/src/host/directio.cpp
+++ b/src/host/directio.cpp
@@ -844,21 +844,34 @@ void EventsToUnicode(_Inout_ std::deque<std::unique_ptr<IInputEvent>>& inEvents,
             // If the point we're trying to write is inside the limited buffer write zone...
             if (targetLimit.IsInBounds(targetPos))
             {
+                auto charInfo = gci.AsCharInfo(*sourceIter);
+
+                if (sourceIter->Chars().size() > 1)
+                    charInfo.Attributes &= ~COMMON_LVB_LEADING_BYTE;
+
                 // Copy the data into position...
-                *targetIter = gci.AsCharInfo(*sourceIter);
-                // ... and advance the read iterator.
-                sourceIter++;
-            }
+                for (const auto i: sourceIter->Chars())
+                {
+                    targetIter->Attributes = charInfo.Attributes;
+                    targetIter->Char.UnicodeChar = i;
 
-            // Always advance the write iterator, we might have skipped it due to clipping.
-            targetIter++;
+                    // Always advance the write iterator, we might have skipped it due to clipping.
+                    ++targetIter;
 
-            // Increment the target
-            targetPos.X++;
-            if (targetPos.X >= targetSize.X)
-            {
-                targetPos.X = 0;
-                targetPos.Y++;
+                    // Increment the target
+                    ++targetPos.X;
+                    if (targetPos.X >= targetSize.X)
+                    {
+                        targetPos.X = 0;
+                        ++targetPos.Y;
+                    }
+
+                    // ... and advance the read iterator.
+                    sourceIter++;
+
+                    if (!targetLimit.IsInBounds(targetPos))
+                        break;
+                }
             }
         }

P.S. I remember that there's #8000 that should improve the situation with surrogates in general. This one is about the API issue: when a surrogate pair occupies two columns, there's really no reason to return two replacement characters if we could return the original pair. The situation with a pair that occupies one column is less obvious, but that's a different issue.

Metadata

Metadata

Assignees

Labels

Area-VTVirtual Terminal sequence supportHelp WantedWe encourage anyone to jump in on these.Issue-FeatureComplex enough to require an in depth planning process and actual budgeted, scheduled work.Product-ConhostFor issues in the Console codebase

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions