Skip to content

Regression vs legacy conhost: different behavior from ReadConsoleOutputCharacterW when surrogate pair(s) are present. #16892

@chrisant996

Description

@chrisant996

Windows Terminal version

1.19.10573.0

Windows build number

10.0.19045.4046

Other Software

This is an API issue, and affects any software that calls ReadConsoleOutputCharacterW.
The API issue only repros with Windows Terminal, not with legacy conhost (nor with ConEmu, ConsoleZ, etc).

For example clink was affected by this when also using eza or dirx (see here for details of how this was encountered during real world usage).

Steps to reproduce

When a line of text in the console screen buffer contains one or more surrogate pairs, then the behavior ReadConsoleOutputCharacterW API does not match the documented contract, and is different from the behavior with legacy conhost.

The attached repro program demonstrates the behavior:

  • It works as expected in legacy conhost (and ConEmu, ConsoleZ, etc).
  • It malfunctions in Windows Terminal.

repro_surrogate_pairs_issue.zip

Repro:

  • Use WriteConsoleW to print a line of text that includes one or more surrogate pairs.
  • Use ReadConsoleOutputCharacterW to read back the same line.

Easy demonstration program:

  1. Create a folder and extract the files from the .zip file linked above.
  2. Optional: run the signed Repro.exe file.
  3. Or, you can build the source files and run a locally built copy of the Repro.exe program.

Expected Behavior

ReadConsoleOutputCharacterW should:

  1. Fill the out param lpNumberOfCharsRead with the number of characters read (e.g. the width of the console).
  2. Fill the out param lpCharacter with characters read from the console.
  3. Return true (success).

The demo program should first write 4 lines, then verify that reading the lines back matches what was originally written.
Each line contains Unicode codepoints that correspond to certain nerdfonts icons.

Sample expected output:

OUTPUT:

-a--- 17k 13 Mar 14:38  Aaa.cpp
-a--- 11k 13 Mar 14:44 󱗀 Bbb.xml
-a--- 10k 13 Mar 14:43 󰅲 Ccc.lisp
-a--- 11k 13 Mar 14:44  Ddd.zip

RESULTS:

Line 1 len 120 matches : -a--- 17k 13 Mar 14:38  Aaa.cpp
Line 2 len 120 matches : -a--- 11k 13 Mar 14:44 󱗀 Bbb.xml
Line 3 len 120 matches : -a--- 10k 13 Mar 14:43 󰅲 Ccc.lisp
Line 4 len 120 matches : -a--- 11k 13 Mar 14:44  Ddd.zip

Actual Behavior

Only in Windows Terminal (all versions; 1.19, 1.20, 1.20 canary):

  1. Fills the out param lpNumberOfCharsRead with 0.
  2. Does not fill the out param lpCharacter.
  3. Returns true (success).

Problem 1: It reads nothing; it should read text successfully, the same as in legacy conhost.
Problem 2: It reports success; that's inaccurate since it failed to read the text that was present.

Sample actual output:

OUTPUT:

-a--- 17k 13 Mar 14:38  Aaa.cpp
-a--- 11k 13 Mar 14:44 󱗀 Bbb.xml
-a--- 10k 13 Mar 14:43 󰅲 Ccc.lisp
-a--- 11k 13 Mar 14:44  Ddd.zip

RESULTS:

Line 1 len 120 matches : -a--- 17k 13 Mar 14:38  Aaa.cpp
Line 2 len 0   DIFFERS :
Line 3 len 0   DIFFERS :
Line 4 len 120 matches : -a--- 11k 13 Mar 14:44  Ddd.zip

Metadata

Metadata

Assignees

Labels

Area-OutputRelated to output processing (inserting text into buffer, retrieving buffer text, etc.)Issue-BugIt either shouldn't be doing this or needs an investigation.Needs-TriageIt's a new issue that the core contributor team needs to triage at the next triage meetingPriority-1A description (P1)Product-ConhostFor issues in the Console codebase

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions