CodepointWidthDetector: reclassify U+25FB, U+25FC as Narrow#5914
Conversation
This seems to be in line with the emoji-sequences table in the latest version of the unicode standard: those glyphs require U+FE0F to activate their emoji presentation. Since we don't support composing U+FE0F, we should not present them as emoji by default. Fixes #5910.
leonMSFT
left a comment
There was a problem hiding this comment.
I can get behind these two, but just looking at emoji-sequences.txt, there's a lot of emojis that have FE0F added on, should all of those be classified as narrow too?
|
@leonMSFT honestly not sure. Looking at that list I see a bunch of things I can imagine people want to be emoji. I sorta hate this problem. :) |
|
I was somewhat informed by printing them all out and seeing what Cascadia Code did. If they were monochrome, I made them narrow. Sorta hate it. |
zadjii-msft
left a comment
There was a problem hiding this comment.
I hate everything - we're never going to have a definitive answer to this, are we?
|
I made #5407 back then to “have a definitive answer” to most of these. I think we can revisit this when 1.0 is out. |
|
@msftbot merge this in 1 minute |
|
Hello @DHowett-MSFT! Because you've given me some instructions on how to help merge this pull request, I'll be modifying my merge approach. Here's how I understand your requirements for merging this pull request:
If this doesn't seem right to you, you can tell me to cancel these instructions and use the auto-merge policy that has been configured for this repository. Try telling me "forget everything I just told you". |
This seems to be in line with the emoji-sequences table in the latest version of the Unicode standard: those glyphs require U+FE0F to activate their emoji presentation. Since we don't support composing U+FE0F, we should not present them as emoji by default. Fixes #5910. Yes, I hate this. (cherry picked from commit c39f9c6)
I suspect they should be. I just did a quick test of the first 65536 codepoints in XTerm, Gnome Terminal, and Mintty to see which characters they treated as wide. There were 90 or so characters that we had as wide, that nobody else did. I haven't checked them all, but my impression is that they're mostly (if not entirely) from the FE0F list. U+2139, U+2328, U+23CF, U+23ED..U+23EF, U+23F1..U+23F2, U+23F8..U+23FA, U+24C2, U+25AA..U+25AB, U+25FB..U+25FC, U+2600..U+2604, U+260E, U+2611, U+2618, U+261D, U+2620, U+2622..U+2623, U+2626, U+262A, U+262E..U+262F, U+2638..U+263A, U+2640, U+2642, U+265F..U+2660, U+2663, U+2665..U+2666, U+2668, U+267B, U+267E, U+2692, U+2694..U+2697, U+2699, U+269B..U+269C, U+26A0, U+26A7, U+26B0..U+26B1, U+26C8, U+26CF, U+26D1, U+26D3, U+26E9, U+26F0..U+26F1, U+26F4, U+26F7..U+26F9, U+2702, U+2708..U+2709, U+270C..U+270D, U+270F, U+2712, U+2714, U+2716, U+271D, U+2721, U+2733..U+2734, U+2744, U+2747, U+2763..U+2764, U+27A1, U+302A..U+302D, U+3099..U+309A, U+321F I wasn't using the very latest build, so I think a couple of those have already been fixed, but that gives you a general idea of the range of characters that might need to be updated. |
|
Apart from the ideograph tone marks (U+302A..U+302D, which look like combining characters) and the other combining ones (3099, 309A), it looks like we've gotten the width right on the ones that we (DirectWrite(!)) have given a default emoji presentation. While I agree that it's non-ideal, we need to get support into our renderer for variations (and default variations) to display them as monochrome glyphs instead of squashed single-column color glyphs if we wanna fix that 😄 |
|
(the combining character problem is a whole other can of worms) |
|
Okay, James and Leon, you’re both correct. All of the lower codepoints that were retroactively reclassified (and require FE0F as a variation selector) must be set to 1-wide and we’ll deal with what fallout comes from that. I’m putting together this change for last minute inclusion in v1. |
|
I’d rather be correct and look slightly wrong than incorrect and introduce a way for third party app developers to take a dependency on some silly way Windows differs from everyone else. |
This removes all glyphs from the emoji list that do not default to "emoji presentation" (EPres). It removes all local overrides, but retain the comments about the emoji we left out that are Microsoft-specific. This brings us fully in line with the most popular Terminals on OS X, except that we squash our emoji down to fit in one cell and they let them hang over the edges and damage other characters. Oh well. Refs #900, #5914.
This removes all glyphs from the emoji list that do not default to "emoji presentation" (EPres). It removes all local overrides, but retains the comments about the emoji we left out that are Microsoft-specific. This brings us fully in line with the most popular Terminals on OS X, except that we squash our emoji down to fit in one cell and they let them hang over the edges and damage other characters. Oh well. ## Detailed Description of the Pull Request / Additional comments Late Friday evening, I tested my emoji test file on iTerm2. In so doing, I realized that @j4james and @leonMSFT were right the entire time in #5914: Emoji that require `U+FE0F` must not be double-width by default. I finally banged up a powershell script that parses the UCD and emits a codepoint width table. Once checked in, this will be definitive. Refs #900, #5914. Fixes #5941.
This removes all glyphs from the emoji list that do not default to "emoji presentation" (EPres). It removes all local overrides, but retains the comments about the emoji we left out that are Microsoft-specific. This brings us fully in line with the most popular Terminals on OS X, except that we squash our emoji down to fit in one cell and they let them hang over the edges and damage other characters. Oh well. ## Detailed Description of the Pull Request / Additional comments Late Friday evening, I tested my emoji test file on iTerm2. In so doing, I realized that @j4james and @leonMSFT were right the entire time in #5914: Emoji that require `U+FE0F` must not be double-width by default. I finally banged up a powershell script that parses the UCD and emits a codepoint width table. Once checked in, this will be definitive. Refs #900, #5914. Fixes #5941. (cherry picked from commit ba1a298)
…t#5914) This seems to be in line with the emoji-sequences table in the latest version of the Unicode standard: those glyphs require U+FE0F to activate their emoji presentation. Since we don't support composing U+FE0F, we should not present them as emoji by default. Fixes microsoft#5910. Yes, I hate this.
This removes all glyphs from the emoji list that do not default to "emoji presentation" (EPres). It removes all local overrides, but retains the comments about the emoji we left out that are Microsoft-specific. This brings us fully in line with the most popular Terminals on OS X, except that we squash our emoji down to fit in one cell and they let them hang over the edges and damage other characters. Oh well. ## Detailed Description of the Pull Request / Additional comments Late Friday evening, I tested my emoji test file on iTerm2. In so doing, I realized that @j4james and @leonMSFT were right the entire time in microsoft#5914: Emoji that require `U+FE0F` must not be double-width by default. I finally banged up a powershell script that parses the UCD and emits a codepoint width table. Once checked in, this will be definitive. Refs microsoft#900, microsoft#5914. Fixes microsoft#5941.
This PR introduces `compatibility.ambiguousWidth` as a **global** compatibility setting (`narrow` default, `wide` optional). The default remains `narrow`. Why global-only in this PR: - Width detection is currently process-wide (`CodepointWidthDetector` singleton). - True profile-level ambiguous-width behavior would require broader architectural changes and is intentionally deferred to a follow-up design/PR. What this PR guarantees: - Terminal-side handling is consistent end-to-end for the selected ambiguous-width policy (rendering path + ConPTY/host propagation). Known limitation: - Some client applications (for example PSReadLine/readline-based apps) may still compute character widths independently. - In such cases, cursor movement or Backspace behavior can differ from visual cell width even when terminal-side policy is consistent. This is a compatibility/readability trade-off feature: - `narrow`: prioritize cross-application compatibility. - `wide`: prioritize readability with many CJK fonts. Closes #153 Closes #370 Refs #2928 Refs #2049, #2066, #2375, #900, #5910, #5914 Co-authored-by: Leonard Hecker <[email protected]>

This seems to be in line with the emoji-sequences table in the latest
version of the Unicode standard: those glyphs require U+FE0F to activate
their emoji presentation. Since we don't support composing U+FE0F, we
should not present them as emoji by default.
Fixes #5910.
Yes, I hate this.