Skip to content

Initial implementation of fine-grained text analysis#9202

Merged
12 commits merged intomicrosoft:mainfrom
skyline75489:chesterliu/dev/analysis-text-complexity
Apr 28, 2021
Merged

Initial implementation of fine-grained text analysis#9202
12 commits merged intomicrosoft:mainfrom
skyline75489:chesterliu/dev/analysis-text-complexity

Conversation

@skyline75489
Copy link
Copy Markdown
Collaborator

@skyline75489 skyline75489 commented Feb 18, 2021

This PR aims to optimize the text analysis process by breaking the text
into simple & complex runs according to the result of
GetTextComplexity. For simple runs, we can skip certain processing
steps to improve the analysis performance.

Previous to this PR, we rely on the result of AnalyzeBidi,
AnalyzeScript and AnalyzeNumberSubstitution to both break the text
into different runs and attach the corresponding
bidi/script/number_substitution information to the run. Thanks to #6695
we have the chance to skip the expensive analysis process when we found
the entire text is determined to be simple.

Inspired by microsoft/cascadia-code#411 and
discussions in #9156, I found that the "entire text simplicity" is often
hard to meet. In order to fully utilize the complexity information of
the text, we need to first break the text into simple & complex ranges.
These ranges are also the initial runs prior to the
bidi/script/number_substitution analysis. This way we can skip the text
analysis for simple runs to speed up the process.

VALIDATION
Build & run cmatrix, cacafire, cat big.txt with it.

Initial simple run PR: #6695
Closes #9156

@skyline75489
Copy link
Copy Markdown
Collaborator Author

@miniksa FYI

@zadjii-msft zadjii-msft added the Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues label Feb 18, 2021
Copy link
Copy Markdown
Member

@miniksa miniksa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, this is exactly what I was meaning yesterday when I described the first attempt at this. Thanks for taking a shot at it.

@ghost ghost added the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Feb 18, 2021
@ghost ghost removed the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Feb 19, 2021
@skyline75489 skyline75489 marked this pull request as ready for review February 19, 2021 01:32
RETURN_IF_FAILED(hr);
_SetCurrentRun(pos);
_SplitCurrentRun(pos);
pos += std::max(uiLengthRead, 1u);
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should handle the situation when uiLengthRead is 0.

@skyline75489
Copy link
Copy Markdown
Collaborator Author

skyline75489 commented Feb 19, 2021

Overall I think this is ready, even without the optimization for AnalyzeFontFallback.

I’ve been testing it with cmatrix, cacafire, CJK content and everything, and found no obvious regression in rendering result. That being said, I hate to see another regression roller coaster like the one in #6695. @miniksa if you’re OK landing this, please guide me through the process to avoid any pitfall I don’t know.

@skyline75489 skyline75489 added the Area-Performance Performance-related issue label Feb 19, 2021
@ghost ghost added Issue-Task It's a feature request, but it doesn't really need a major design. Product-Conhost For issues in the Console codebase Product-Terminal The new Windows Terminal. labels Feb 19, 2021
@skyline75489
Copy link
Copy Markdown
Collaborator Author

Weird, why did the bot tag this PR with Product-Conhost when none of the issus/PR mentioned belongs to that area?

@skyline75489
Copy link
Copy Markdown
Collaborator Author

Oh I forgot to mention. This PR will save ~3% of CPU when running cmatrix.

@zadjii-msft
Copy link
Copy Markdown
Member

Weird, why did the bot tag this PR with Product-Conhost when none of the issus/PR mentioned belongs to that area?

Uh, it is tagged as conhost?

image


Oh I forgot to mention. This PR will save ~3% of CPU when running cmatrix.

image

@skyline75489
Copy link
Copy Markdown
Collaborator Author

Ha, you got me. Well if we're going to use DX renderer in conhost eventually in the future, I guess the tag makes sense.

Also to be more specific about the CPU usage. About ~3% of CPU is saved in the analysis process. The overall CPU usage is roughly the same, but more cycles can be used in the rendering process. So the terminal is theoretically faster.

Copy link
Copy Markdown
Member

@miniksa miniksa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@miniksa miniksa removed their assignment Apr 15, 2021
@miniksa miniksa added the Needs-Second It's a PR that needs another sign-off label Apr 15, 2021
@ghost ghost requested a review from lhecker April 28, 2021 08:34
Copy link
Copy Markdown
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't believe I never gave this the ✔️

Copy link
Copy Markdown
Member

@DHowett DHowett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Michael and Mike trust this, I am comfortable with it. Thank you, and sorry for the delay.

Comment on lines +188 to +192
while (uiLengthRead > 0)
{
auto& run = _FetchNextRun(uiLengthRead);
run.isTextSimple = isTextSimple;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This might deserve a comment calling out that _FetchNextRun decrements uiLengthRead. I thought this was an infinite loop!

@DHowett DHowett added the AutoMerge Marked for automatic merge by the bot when requirements are met label Apr 28, 2021
@ghost
Copy link
Copy Markdown

ghost commented Apr 28, 2021

Hello @DHowett!

Because this pull request has the AutoMerge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit 1c414a7 into microsoft:main Apr 28, 2021
@skyline75489 skyline75489 deleted the chesterliu/dev/analysis-text-complexity branch April 29, 2021 22:05
skyline75489 added a commit to skyline75489/terminal that referenced this pull request May 5, 2021
DHowett added a commit that referenced this pull request May 11, 2021
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area-Performance Performance-related issue Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues AutoMerge Marked for automatic merge by the bot when requirements are met Issue-Task It's a feature request, but it doesn't really need a major design. Needs-Second It's a PR that needs another sign-off Product-Conhost For issues in the Console codebase Product-Terminal The new Windows Terminal.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fine-grained DWrite text analysis based on text complexity

4 participants