Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

Conversation

@mdebbar
Copy link
Contributor

@mdebbar mdebbar commented Jun 3, 2020

Switch to using the unicode data instead of checking manually for new lines and whitespace.

Note: this PR uses minimal parts of the unicode data just to cover the main things (e.g. whitespace, NL, BK, LF, CR, etc). There's another PR coming that will implement the full algorithm that'll take advantage of the entire unicode data.

Didn't add any tests because I'm relying on existing tests to verify that the switch to unicode data doesn't break existing behavior.

TODO:

  • Check if there are any performance implications.

@mdebbar mdebbar added the platform-web Code specifically for the web engine label Jun 3, 2020
@mdebbar mdebbar requested a review from yjbanov June 3, 2020 20:48
@mdebbar mdebbar self-assigned this Jun 3, 2020
}

/// Normalizes properties that behave the same way into one common property.
LineCharProperty _normalizeLineProperty(LineCharProperty prop) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this substitution be made in the _packedLineBreakProperties so we never need to convert at run time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This normalization is part of the algorithm. I want to keep codegen independent.

The other reason is that in some cases we still need to know the original property (pre-normalization). e.g. rawCurr == LineCharProperty.LF.

There are a few optimizations that I can think of. I expect them to have marginal perf improvements but nothing drastic.

Some ideas for my future reference:

  • Normalize properties during codegen.
  • Reduce the amount of binary searches by either caching at runtime, or codegen'ing a hash map for the most common characters, or both.
  • For paragraphs that can fit in a single line, we can avoid a whole bunch of computations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about inside UnicodePropertyLookup.fromPackedData?

I'm just throwing ideas out. The PR LGTM. This comment is non-blocking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnicodePropertyLookup.fromPackedData is used by both line breaker and word breaker. They have different properties and different rules for normalization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your general point though. And I agree we could do some optimizations there.

// TODO: Use the 2d table now. See https://www.unicode.org/reports/tr14/tr14-22.html#ExampleTable

// TODO: After using the 2d table, do:
// hasSpaces = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these todos be github issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are coming in the next PR.

@mdebbar mdebbar force-pushed the line_break_algorithm1 branch from 3b5dcb7 to b2b172c Compare June 8, 2020 22:25
@mdebbar mdebbar force-pushed the line_break_algorithm1 branch from b2b172c to db7f8d5 Compare June 9, 2020 16:26
@mdebbar mdebbar merged commit 3e9f8f3 into flutter:master Jun 9, 2020
@fluttergithubbot
Copy link
Contributor

It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat.

Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing.

engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jun 10, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jun 10, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jun 10, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jun 10, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jun 10, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jun 10, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jun 11, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jun 11, 2020
@mdebbar mdebbar deleted the line_break_algorithm1 branch April 15, 2021 17:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla: yes platform-web Code specifically for the web engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants