Skip to content

Some Unicode characters are improperly accepted or dropped #9052

@chitoku-k

Description

@chitoku-k

Environment

Windows build number: 10.0.19042.0
Windows Terminal version (if applicable): 1.5.x and 1.6.x
PowerShell: 5.1.19041.610 and 7.1.1

Steps to reproduce

  1. Run Windows Terminal built after Fully regenerate CodepointWidthDetector from Unicode 13.0 #8035 gets merged.
  2. Input 0123456789 (FULLWIDTH DIGIT) by any methods such as right click to paste, Ctrl + V, or from keyboard.

Expected behavior

  • 0123456789 is input.

Actual behavior

  • Only the first letter ( in this case) is input.

Detailed Explanation

The implementation of GetQuickCharWidth has been changed in #8035 and affected the following invocations:

The former is totally acceptable because it falls back to looking up from Unicode table in CodepointWidthDetector later on; however, the latter one has been broken in this PR.

When the scanned key is considered invalid in CharToKeyEvents, it tries to detect character width by calling GetQuickCharWidth and make it process as keyboard events if the width is CodepointWidth::Wide. In the aforementioned PR, however, GetQuickCharWidth no longer returns CodepointWidth::Wide and instead returns CodepointWidth::Invalid for the characters other than ASCII, and results in SynthesizeNumpadEvents being called for 0123456789. Since this function processes the given characters as if it were typed with Alt key + numpad, the result becomes nondeterministic (such as some applications like cmd.exe process this normally but powershell.exe only accepts the first letter).

if (WI_IsFlagSet(CharType, C3_ALPHA) || GetQuickCharWidth(wch) == CodepointWidth::Wide)

One option is to return a new value like CodepointWidth::Unknown when GetQuickCharWidth cannot detect the character width immediately,

  CodepointWidth GetQuickCharWidth(const wchar_t wch) noexcept
  {
      if (0x20 <= wch && wch <= 0x7e)
      {
          /* ASCII */
          return CodepointWidth::Narrow;
      }
+     else if (wch < 0xffff)
+     {
+         return CodepointWidth::Unknown;
+     }
      return CodepointWidth::Invalid;
  }

because the definition of CodepointWidth::Invalid is not a valid unicode codepoint.

enum class CodepointWidth : BYTE
{
Narrow,
Wide,
Ambiguous, // could be narrow or wide depending on the current codepage and font
Invalid // not a valid unicode codepoint
};

In CharToKeyEvents, the expression should be corrected to:

- if (WI_IsFlagSet(CharType, C3_ALPHA) || GetQuickCharWidth(wch) == CodepointWidth::Wide)
+ if (WI_IsFlagSet(CharType, C3_ALPHA) || GetQuickCharWidth(wch) == CodepointWidth::Unknown)

and in CodepointWidthDetector::GetWidth(), it has to support the new value:

-         // If it's invalid, the quick width had no opinion, so go to the lookup table.
-         if (width == CodepointWidth::Invalid)
+         // If it's unknown or invalid, the quick width had no opinion, so go to the lookup table.
+         if (width == CodepointWidth::Unknown || width == CodepointWidth::Invalid)

I can make a PR if the way of fix I've suggested is acceptable. Thanks in advance.

Appendix

Note that this issue is not related to the issue in PSReadLine at all because it can be reproducible in applications launched from PowerShell.

Metadata

Metadata

Assignees

Labels

Area-InputRelated to input processing (key presses, mouse, etc.)Issue-BugIt either shouldn't be doing this or needs an investigation.Needs-Tag-FixDoesn't match tag requirementsProduct-ConhostFor issues in the Console codebaseProduct-TerminalThe new Windows Terminal.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions