Skip to content

Introduce UnicodeAttribute for better Unicode support#5407

Closed
skyline75489 wants to merge 6 commits intomicrosoft:mainfrom
skyline75489:feature/unicode-attribute
Closed

Introduce UnicodeAttribute for better Unicode support#5407
skyline75489 wants to merge 6 commits intomicrosoft:mainfrom
skyline75489:feature/unicode-attribute

Conversation

@skyline75489
Copy link
Collaborator

@skyline75489 skyline75489 commented Apr 18, 2020

Summary of the Pull Request

This is the successor of #3578 which aims to improve Unicode support in the perspective of M:N cell rendering and ZWJ/ZWNJ.

References

#1472 #3546 #3578

PR Checklist

Detailed Description of the Pull Request / Additional comments

I'm trying to add the possibility of Unicode support while leaving good old cmd.exe along. So I added a UnicodeAttribute class, which is specifically crafted for the all kinds of Unicode thingy.

The code is in its very early stage and the tests are left broken. Just wanna share the idea.

I managed to get many features including Combing/ZWJ/ZWNJ to work. Emoji with VSS-16 also works . But without VSS-16 those emoji will still be half-sized.

Hangul composition requires more interaction with clusters inside CustomTextLayout which I believe is out of the scope of this PR.

I'm not really sure what I did in the PR is the correct direction overall. Cluster category and adhesive logic (which @reli-msft mentioned) may also leads to the same result.

Validation Steps Performed

WIP

@skyline75489
Copy link
Collaborator Author

This is what I got by far using the reproducing code in #3546 :

图片

There's still something missing (the umbrella emoji is small) . Considering how many lines of code this PR actually adds, I feel it's quite worth the effort. I'll take a look and check that if it also fixes the other Unicode related issues.

By the way jialli is my alias, if you're wondering.


std::wstring* currentText = new std::wstring(it->Chars());

while (moveForward)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is of course a very naive implementation. But I think it's a good start.

bool inJoiner = nextCell->UnicodeAttr().IsJoiner();
bool moveForward = nextCell && (nextCell->UnicodeAttr().IsBackwordAdhesive());

std::wstring* currentText = new std::wstring(it->Chars());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just purely wrong. I couldn't find a way to concat two std::string_view. Neither can I access the underlying text buffer. I had to allocate a new string to get the content I need. I'd love to hear some opinions.

Copy link
Contributor

@WSLUser WSLUser Apr 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


void SetGlyph(std::wstring_view glyph) noexcept
{
const auto codepoint = CodepointWidthDetector::ExtractCodepoint(glyph);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically just pulled these from Unicode.org Website

@github-actions
Copy link

New misspellings found, please review:

  • FITZPATRICK
  • jialli
  • Unspacing
To accept these changes, run the following commands
remove_obsolete_words=$(mktemp)
echo '#!/usr/bin/perl -ni
my $re=join "|", qw('"
aaaaa
aaaaaa
aaaaaaaaaaaaaaaaaaa
ABCDEFGHIJKLMNOPQRSTQ
ABCDEFGHIJPQRST
ABCDEFGHIJPQRSTQ
APPLOGIC
backgrounder
BBBB
BBBBBBBBBBBBBBBBBBBBBBBBBB
CCCC
codeofconduct
controlkeystates
customframe
cvd
dataprotection
dataprotectionprovider
DDDD
directx
EEEE
FAILOUT
ffff
findlocalename
getboundingrectangles
getglyphs
getlocalename
getlocalenamelength
getstring
getstringlength
ggggg
gggggg
ggggggg
hackathon
hidpi
idwritefontcollection
idwritelocalizedstrings
idwritetextanalyzer
implementingtextandtextrange
inputdev
itextprovider
itextrangeprovider
iuiautomationtextrange
LLLLLLLL
nchittest
nf
nn
pgp
pointerpoint
polytextoutw
QQQQ
RRRRRRR
Spc
sysinfo
tounicodeex
uiauto
uiautomationclient
unicodechar
usr
varenum
vsconfig
winauto
windowsinsiderpreview
windowsterminal
wpt
XChars
xxxx
YChars
"');
next if /^($re)(?:$| .*)/;
print;' > $remove_obsolete_words
chmod +x $remove_obsolete_words
for file in .github/actions/spell-check/whitelist/alphabet.txt .github/actions/spell-check/whitelist/web.txt .github/actions/spell-check/whitelist/whitelist.txt; do $remove_obsolete_words $file; done
rm $remove_obsolete_words
(
echo "
applogic
Backgrounder
DIRECTX
FITZPATRICK
Hackathon
jialli
NCHITTEST
PGP
Unspacing
"
) | sort -u -f | perl -ne 'next unless /./; print' > new_whitelist.txt && mv new_whitelist.txt '.github/actions/spell-check/whitelist/bed3c47c4036ef62be34c34e37dcd50514a96c0b.txt'
✏️ Contributor please read this
  • If the items listed above are names, please add them to .github/actions/spell-check/dictionary/names.txt.
  • If they're APIs, you can add them to a file in .github/actions/spell-check/dictionary/.
  • If they're just things you're using, please add them to an appropriate file in .github/actions/spell-check/whitelist/.
  • If you need to use a specific token in one place and it shouldn't generally be used, you can
    add an item in an appropriate file in .github/actions/spell-check/patterns/.

See the README.md in each directory for more information.

⚠️ Reviewers

At present, the action that triggered this message will not show its ❌ in this PR unless the branch is within this repository.
Thus, you should make sure that this comment has been addressed before encouraging the merge bot to merge this PR.

@github-actions
Copy link

New misspellings found, please review:

  • FITZPATRICK
  • jialli
  • JONGSEONG
  • JUNGSEONG
  • KIYEOK
  • SSANGNIEUN
  • Unspacing
To accept these changes, run the following commands
remove_obsolete_words=$(mktemp)
echo '#!/usr/bin/perl -ni
my $re=join "|", qw('"
aaaaa
aaaaaa
aaaaaaaaaaaaaaaaaaa
ABCDEFGHIJKLMNOPQRSTQ
ABCDEFGHIJPQRST
ABCDEFGHIJPQRSTQ
APPLOGIC
backgrounder
BBBB
BBBBBBBBBBBBBBBBBBBBBBBBBB
CCCC
codeofconduct
controlkeystates
customframe
cvd
dataprotection
dataprotectionprovider
DDDD
directx
EEEE
FAILOUT
ffff
findlocalename
getboundingrectangles
getglyphs
getlocalename
getlocalenamelength
getstring
getstringlength
ggggg
gggggg
ggggggg
hackathon
hidpi
idwritefontcollection
idwritelocalizedstrings
idwritetextanalyzer
implementingtextandtextrange
inputdev
itextprovider
itextrangeprovider
iuiautomationtextrange
LLLLLLLL
nchittest
nf
nn
pgp
pointerpoint
polytextoutw
QQQQ
RRRRRRR
Spc
sysinfo
tounicodeex
uiauto
uiautomationclient
unicodechar
usr
varenum
vsconfig
winauto
windowsinsiderpreview
windowsterminal
wpt
XChars
xxxx
YChars
"');
next if /^($re)(?:$| .*)/;
print;' > $remove_obsolete_words
chmod +x $remove_obsolete_words
for file in .github/actions/spell-check/whitelist/alphabet.txt .github/actions/spell-check/whitelist/web.txt .github/actions/spell-check/whitelist/whitelist.txt; do $remove_obsolete_words $file; done
rm $remove_obsolete_words
(
echo "
applogic
Backgrounder
DIRECTX
FITZPATRICK
Hackathon
jialli
JONGSEONG
JUNGSEONG
KIYEOK
NCHITTEST
PGP
SSANGNIEUN
Unspacing
"
) | sort -u -f | perl -ne 'next unless /./; print' > new_whitelist.txt && mv new_whitelist.txt '.github/actions/spell-check/whitelist/eb2000bb79caf4608098ec9a728535c75ba85ebc.txt'
✏️ Contributor please read this
  • If the items listed above are names, please add them to .github/actions/spell-check/dictionary/names.txt.
  • If they're APIs, you can add them to a file in .github/actions/spell-check/dictionary/.
  • If they're just things you're using, please add them to an appropriate file in .github/actions/spell-check/whitelist/.
  • If you need to use a specific token in one place and it shouldn't generally be used, you can
    add an item in an appropriate file in .github/actions/spell-check/patterns/.

See the README.md in each directory for more information.

⚠️ Reviewers

At present, the action that triggered this message will not show its ❌ in this PR unless the branch is within this repository.
Thus, you should make sure that this comment has been addressed before encouraging the merge bot to merge this PR.

@skyline75489 skyline75489 changed the title Introduce Unicode Attribute for better Unicode support Introduce UnicodeAttribute for better Unicode support Apr 18, 2020
@zadjii-msft
Copy link
Member

Hey for the record, we're all pretty heads down with getting the 1.0 release ready at the moment - it might be a bit before we can get to reviewing this. Thanks for your patience!

@microsoft-github-updates microsoft-github-updates bot changed the base branch from master to main October 22, 2020 00:27
@ghost ghost added Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-1 A description (P1) Product-Terminal The new Windows Terminal. labels Oct 22, 2020
@skyline75489
Copy link
Collaborator Author

This is a stale PR. Closing it for now.

@skyline75489 skyline75489 deleted the feature/unicode-attribute branch January 25, 2021 02:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-1 A description (P1) Product-Terminal The new Windows Terminal.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The text layout code is expecting the quantity of glyphs is equal to the cells allocated for each line

3 participants