Skip to content

Parse UTF-16 surrogates pairs for calculating pattern's position#11915

Merged
1 commit merged intomicrosoft:mainfrom
comzyh:fix-utf-16-surrogates-parsing-for-get-patterns
Dec 9, 2021
Merged

Parse UTF-16 surrogates pairs for calculating pattern's position#11915
1 commit merged intomicrosoft:mainfrom
comzyh:fix-utf-16-surrogates-parsing-for-get-patterns

Conversation

@comzyh
Copy link
Contributor

@comzyh comzyh commented Dec 9, 2021

Summary of the Pull Request

Properly handle UTF-16 surrogates when calculating the position of matched pattern.

Fix #8709

References

const auto charData = Utf16Parser::Parse(wstr);
std::vector<std::vector<wchar_t>> cells;
for (const auto chars : charData)
{
if (IsGlyphFullWidth(std::wstring_view{ chars.data(), chars.size() }))

PR Checklist

  • Closes Links/URLs get offset when certain unicode characters are printed #8709
  • CLA signed. If not, go over here and sign the CLA
  • Tests added/passed
  • Documentation updated. If checked, please file a pull request on our docs repo and link it here: #xxx
  • Schema updated.
  • I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

Detailed Description of the Pull Request / Additional comments

use Utf16Parser::Parse to handle code points from U+010000 to U+10FFFF in UTF-16.

Validation Steps Performed

image

also the case by @mas90 #8709 (comment):

image

@ghost ghost added Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues Area-TerminalControl Issues pertaining to the terminal control (input, selection, keybindings, mouse interaction, etc.) Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-2 A description (P2) Product-Terminal The new Windows Terminal. labels Dec 9, 2021
@comzyh comzyh changed the title parse UTF-16 surrogates pair for calculating pattern position Parse UTF-16 surrogates pairs for calculating pattern's position Dec 9, 2021
Copy link
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks!

@zadjii-msft
Copy link
Member

@msftbot make sure @PankajBhojwani signs off on this one

@ghost ghost added the AutoMerge Marked for automatic merge by the bot when requirements are met label Dec 9, 2021
@ghost
Copy link

ghost commented Dec 9, 2021

Hello @zadjii-msft!

Because you've given me some instructions on how to help merge this pull request, I'll be modifying my merge approach. Here's how I understand your requirements for merging this pull request:

If this doesn't seem right to you, you can tell me to cancel these instructions and use the auto-merge policy that has been configured for this repository. Try telling me "forget everything I just told you".

size_t prefixSize = 0;

for (const auto ch : i->prefix().str())
for (const std::vector<wchar_t> parsedGlyph : Utf16Parser::Parse(i->prefix().str()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have left auto for the left side here and below... but I'm not going to block over it.

@zadjii-msft
Copy link
Member

@msftbot merge this in 1 minute

@ghost
Copy link

ghost commented Dec 9, 2021

Hello @zadjii-msft!

Because you've given me some instructions on how to help merge this pull request, I'll be modifying my merge approach. Here's how I understand your requirements for merging this pull request:

  • I'll only merge this pull request if it's approved by @PankajBhojwani
  • I won't merge this pull request until after the UTC date Thu, 09 Dec 2021 18:41:11 GMT, which is in 1 minute

If this doesn't seem right to you, you can tell me to cancel these instructions and use the auto-merge policy that has been configured for this repository. Try telling me "forget everything I just told you".

@ghost ghost merged commit a2d96d6 into microsoft:main Dec 9, 2021
DHowett pushed a commit that referenced this pull request Dec 13, 2021
)

<!-- Enter a brief description/summary of your PR here. What does it fix/what does it change/how was it tested (even manually, if necessary)? -->
## Summary of the Pull Request

Properly handle UTF-16 surrogates when calculating the position of matched pattern.

Fix #8709

<!-- Other than the issue solved, is this relevant to any other issues/existing PRs? -->
## References
https://github.com/microsoft/terminal/blob/b88ffb21b0725331877ba76bac5a79a4c21eaa03/src/buffer/out/search.cpp#L335-L339

<!-- Please review the items on the PR checklist before submitting-->
## PR Checklist
* [ ] Closes #8709
* [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA
* [ ] Tests added/passed
* [ ] Documentation updated. If checked, please file a pull request on [our docs repo](https://github.com/MicrosoftDocs/terminal) and link it here: #xxx
* [ ] Schema updated.
* [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

<!-- Provide a more detailed description of the PR, other things fixed or any additional comments/features here -->
## Detailed Description of the Pull Request / Additional comments
use `Utf16Parser::Parse` to handle code points from U+010000 to U+10FFFF in UTF-16.

<!-- Describe how you validated the behavior. Add automated tests wherever possible, but list manual validation steps taken as well -->
## Validation Steps Performed

![image](https://user-images.githubusercontent.com/1068203/145421736-c842c7d4-0136-42d0-ad72-f004f58d9e3b.png)

also the case by @mas90  #8709 (comment):

![image](https://user-images.githubusercontent.com/1068203/145420264-3fe220b4-42c5-44ac-aa94-4e604b164ed3.png)

(cherry picked from commit a2d96d6)
DHowett pushed a commit that referenced this pull request Dec 13, 2021
)

<!-- Enter a brief description/summary of your PR here. What does it fix/what does it change/how was it tested (even manually, if necessary)? -->
## Summary of the Pull Request

Properly handle UTF-16 surrogates when calculating the position of matched pattern.

Fix #8709

<!-- Other than the issue solved, is this relevant to any other issues/existing PRs? -->
## References
https://github.com/microsoft/terminal/blob/b88ffb21b0725331877ba76bac5a79a4c21eaa03/src/buffer/out/search.cpp#L335-L339

<!-- Please review the items on the PR checklist before submitting-->
## PR Checklist
* [ ] Closes #8709
* [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA
* [ ] Tests added/passed
* [ ] Documentation updated. If checked, please file a pull request on [our docs repo](https://github.com/MicrosoftDocs/terminal) and link it here: #xxx
* [ ] Schema updated.
* [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

<!-- Provide a more detailed description of the PR, other things fixed or any additional comments/features here -->
## Detailed Description of the Pull Request / Additional comments
use `Utf16Parser::Parse` to handle code points from U+010000 to U+10FFFF in UTF-16.

<!-- Describe how you validated the behavior. Add automated tests wherever possible, but list manual validation steps taken as well -->
## Validation Steps Performed

![image](https://user-images.githubusercontent.com/1068203/145421736-c842c7d4-0136-42d0-ad72-f004f58d9e3b.png)

also the case by @mas90  #8709 (comment):

![image](https://user-images.githubusercontent.com/1068203/145420264-3fe220b4-42c5-44ac-aa94-4e604b164ed3.png)

(cherry picked from commit a2d96d6)
@ghost
Copy link

ghost commented Dec 14, 2021

🎉Windows Terminal Preview v1.12.3472.0 has been released which incorporates this pull request.:tada:

Handy links:

@ghost
Copy link

ghost commented Dec 14, 2021

🎉Windows Terminal v1.11.3471.0 has been released which incorporates this pull request.:tada:

Handy links:

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues Area-TerminalControl Issues pertaining to the terminal control (input, selection, keybindings, mouse interaction, etc.) AutoMerge Marked for automatic merge by the bot when requirements are met Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-2 A description (P2) Product-Terminal The new Windows Terminal.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Links/URLs get offset when certain unicode characters are printed

4 participants