Optimize editing long text by caching each paragraph#5411
Optimize editing long text by caching each paragraph#5411emilk merged 77 commits intoemilk:masterfrom
Conversation
|
Preview available at https://egui-pr-preview.github.io/pr/5411-cachegalleylines |
|
Okay, so I did the thing and implemented the Currently this "somewhat" works, I've managed to fix the selection issues I was seeing, but there is some layout issues you can actually see in the live demo above where the text is overlapping sometimes when wrapped (since this was present before the whole |
|
So I've managed to make this look pretty correct, these are the remaining issues:
|
|
I think I've made some good progress:
The remaining test failures are:
|
06e639e to
139f286
Compare
There was a problem hiding this comment.
This looks good! I haven't reviewed all of GalleyCache yet, because it is a very dense and complex function.
I wonder how we best test and/or benchmark this though 🤔
I defiantly feel we should have a benchmark to justify this code, i.e. something that shows a big speedup before/after this PR.
| // Note we **don't** ignore the leading/trailing whitespace here! | ||
| row.size.x = target_max_x - target_min_x; |
There was a problem hiding this comment.
Ah, now that I think about it changing this comment was a mistake, leading and trailing whitespace is actually still ignored (I don't know why I thought otherwise). I'll change it back.
However ignoring whitespace here does seem kind of strange since the current docs for Row::rect say that it includes whitespace. Wouldn't this be wrong after it passes through this function? Maybe at least that documentation could be adjusted to clarify this is only true for left-aligned non-justified text.
I also noticed a more important issue though I think, this function actually makes Rows have glyphs that are outside their PlacedRow::pos + Row::size rectangle (because it just repositions the glyphs and leaves the position unchanged). This currently breaks selecting such text. I'll try to fix this soon.
There was a problem hiding this comment.
Fixed the issue found above and the comment. Although I still want to know what you think about changing the Row::size comment, in general halign is surprising with how it suddenly totally changes how the coordinates work even though it makes perfect sense in hindsight.
I agree having a benchmark would be nice, currently the only thing I have is "it looks quicker" when you have a lot of text (deletion of small segments works instantly for example).
As for tests, I've encountered issues with selection drawing multiple times while working on this at this point, so maybe it would be beneficial if this was actually checked by the test suite, it looks like mouse events are supported in kittest so maybe that's a good start. Especially since I also think selection visuals might be another good place for optimization because currently selecting the whole text will force all It also looks like the |
f406701 to
40f237d
Compare
)" This reverts commit 557bd56.
)" This reverts commit 557bd56.
)" This reverts commit 557bd56.
)" This reverts commit 557bd56.
)" This reverts commit 557bd56.
)" This reverts commit 557bd56.
)" This reverts commit 557bd56.
|
I think this broke multi-line |
* Closes <#7397> * [X] I have followed the instructions in the PR template I do admit I got a peak NixOS `RequestDeviceError` and deemed it entirely not worth it to think about that. #5411 broke rotation of multi-line `TextShape`s because `PlacedRow::pos` was no longer being rotated, so let's rotate it. --------- Co-authored-by: Emil Ernerfeldt <[email protected]> Co-authored-by: Lucas Meurer <[email protected]>
* Closes <#7397> * [X] I have followed the instructions in the PR template I do admit I got a peak NixOS `RequestDeviceError` and deemed it entirely not worth it to think about that. #5411 broke rotation of multi-line `TextShape`s because `PlacedRow::pos` was no longer being rotated, so let's rotate it. --------- Co-authored-by: Emil Ernerfeldt <[email protected]> Co-authored-by: Lucas Meurer <[email protected]>
## What (written by @emilk) When editing long text (thousands of line), egui would previously re-layout the entire text on each edit. This could be slow. With this PR, we instead split the text into paragraphs (split on `\n`) and then cache each such paragraph. When editing text then, only the changed paragraph needs to be laid out again. Still, there is overhead from splitting the text, hashing each paragraph, and then joining the results, so the runtime complexity is still O(N). In our benchmark, editing a 2000 line string goes from ~8ms to ~300 ms, a speedup of ~25x. In the future, we could also consider laying out each paragraph in parallel, to speed up the initial layout of the text. ## Details This is an ~~almost complete~~ implementation of the approach described by emilk [in this comment](<emilk#3086 (comment)>), excluding CoW semantics for `LayoutJob` (but including them for `Row`). It supersedes the previous unsuccessful attempt here: emilk#4000. Draft because: - [X] ~~Currently individual rows will have `ends_with_newline` always set to false. This breaks selection with Ctrl+A (and probably many other things)~~ - [X] ~~The whole block for doing the splitting and merging should probably become a function (I'll do that later).~~ - [X] ~~I haven't run the check script, the tests, and haven't made sure all of the examples build (although I assume they probably don't rely on Galley internals).~~ - [x] ~~Layout is sometimes incorrect (missing empty lines, wrapping sometimes makes text overlap).~~ - A lot of text-related code had to be changed so this needs to be properly tested to ensure no layout issues were introduced, especially relating to the now row-relative coordinate system of `Row`s. Also this requires that we're fine making these very breaking changes. It does significantly improve the performance of rendering large blocks of text (if they have many newlines), this is the test program I used to test it (adapted from <emilk#3086>): <details> <summary>code</summary> ```rust use eframe::egui::{self, CentralPanel, TextEdit}; use std::fmt::Write; fn main() -> Result<(), eframe::Error> { let options = eframe::NativeOptions { ..Default::default() }; eframe::run_native( "editor big file test", options, Box::new(|_cc| Ok(Box::<MyApp>::new(MyApp::new()))), ) } struct MyApp { text: String, } impl MyApp { fn new() -> Self { let mut string = String::new(); for line_bytes in (0..50000).map(|_| (0u8..50)) { for byte in line_bytes { write!(string, " {byte:02x}").unwrap(); } write!(string, "\n").unwrap(); } println!("total bytes: {}", string.len()); MyApp { text: string } } } impl eframe::App for MyApp { fn update(&mut self, ctx: &egui::Context, _frame: &mut eframe::Frame) { CentralPanel::default().show(ctx, |ui| { let start = std::time::Instant::now(); egui::ScrollArea::vertical().show(ui, |ui| { let code_editor = TextEdit::multiline(&mut self.text) .code_editor() .desired_width(f32::INFINITY) .desired_rows(40); let response = code_editor.show(ui).response; if response.changed() { println!("total bytes now: {}", self.text.len()); } }); let end = std::time::Instant::now(); let time_to_update = end - start; if time_to_update.as_secs_f32() > 0.5 { println!("Long update took {:.3}s", time_to_update.as_secs_f32()) } }); } } ``` </details> I think the way to proceed would be to make a new type, something like `PositionedRow`, that would wrap an `Arc<Row>` but have a separate `pos` ~~and `ends_with_newline`~~ (that would mean `Row` only holds a `size` instead of a `rect`). This type would of course have getters that would allow you to easily get a `Rect` from it and probably a `Deref` to the underlying `Row`. ~~I haven't done this yet because I wanted to get some opinions whether this would be an acceptable API first.~~ This is now implemented, but of course I'm still open to discussion about this approach and whether it's what we want to do. Breaking changes (currently): - The `Galley::rows` field has a different type. - There is now a `PlacedRow` wrapper for `Row`. - `Row` now uses a coordinate system relative to itself instead of the `Galley`. * Closes <emilk#3086> * [X] I have followed the instructions in the PR template --------- Co-authored-by: Emil Ernerfeldt <[email protected]>
* Closes <emilk#6904> * [x] I have followed the instructions in the PR template This was broken in emilk#5411. Not sure if this is the best fix or if `PlacedRow::rect` should be updated, but I think it makes sense that PlacedRow::rect ignores leading space.
emilk#7031) Fixes a regression introduced in emilk#5411 (possibly emilk@d74bee5) that breaks `leading_space` handling. I think this is what the condition should be but I haven't touched this code in a while.
<!-- Please read the "Making a PR" section of [`CONTRIBUTING.md`](https://github.com/emilk/egui/blob/main/CONTRIBUTING.md) before opening a Pull Request! * Keep your PR:s small and focused. * The PR title is what ends up in the changelog, so make it descriptive! * If applicable, add a screenshot or gif. * If it is a non-trivial addition, consider adding a demo for it to `egui_demo_lib`, or a new example. * Do NOT open PR:s from your `master` branch, as that makes it hard for maintainers to test and add commits to your PR. * Remember to run `cargo fmt` and `cargo clippy`. * Open the PR as a draft until you have self-reviewed it and run `./scripts/check.sh`. * When you have addressed a PR comment, mark it as resolved. Please be patient! I will review your PR, but my time is limited! --> Moves `ends_with_newline` into `PlacedRow` to avoid clones during layout. I don't think there was a rationale stronger than "don't change too much" for not doing this in #5411, so I should've just done this from the start. This was a significant part of the profile for text layout (as it cloned almost every `Row`, even though it only needed to change a single boolean). Before: <img width="757" height="250" alt="image" src="https://github.com/user-attachments/assets/d1c2afd1-f1ec-4cf5-9d05-f5a5a78052df" /> After: <img width="615" height="249" alt="image" src="https://github.com/user-attachments/assets/c70966da-c892-4e84-adba-494d0f37f263" /> (note that these profiles focus solely on the top-level `Galley::layout_inline` subtree, also don't compare sample count as the duration of these tests was completely arbitrary) egui_demo_lib `*text_layout*` benches: <img width="791" height="461" alt="image" src="https://github.com/user-attachments/assets/4f97ce84-2768-4876-9488-d42f8f358ed1" /> * [X] I have followed the instructions in the PR template (As usual, the tests fail for me even on master but the failures on master and with these changes seem the same :))
* Closes <emilk#7397> * [X] I have followed the instructions in the PR template I do admit I got a peak NixOS `RequestDeviceError` and deemed it entirely not worth it to think about that. emilk#5411 broke rotation of multi-line `TextShape`s because `PlacedRow::pos` was no longer being rotated, so let's rotate it. --------- Co-authored-by: Emil Ernerfeldt <[email protected]> Co-authored-by: Lucas Meurer <[email protected]>
<!-- Please read the "Making a PR" section of [`CONTRIBUTING.md`](https://github.com/emilk/egui/blob/main/CONTRIBUTING.md) before opening a Pull Request! * Keep your PR:s small and focused. * The PR title is what ends up in the changelog, so make it descriptive! * If applicable, add a screenshot or gif. * If it is a non-trivial addition, consider adding a demo for it to `egui_demo_lib`, or a new example. * Do NOT open PR:s from your `master` branch, as that makes it hard for maintainers to test and add commits to your PR. * Remember to run `cargo fmt` and `cargo clippy`. * Open the PR as a draft until you have self-reviewed it and run `./scripts/check.sh`. * When you have addressed a PR comment, mark it as resolved. Please be patient! I will review your PR, but my time is limited! --> Moves `ends_with_newline` into `PlacedRow` to avoid clones during layout. I don't think there was a rationale stronger than "don't change too much" for not doing this in emilk#5411, so I should've just done this from the start. This was a significant part of the profile for text layout (as it cloned almost every `Row`, even though it only needed to change a single boolean). Before: <img width="757" height="250" alt="image" src="https://github.com/user-attachments/assets/d1c2afd1-f1ec-4cf5-9d05-f5a5a78052df" /> After: <img width="615" height="249" alt="image" src="https://github.com/user-attachments/assets/c70966da-c892-4e84-adba-494d0f37f263" /> (note that these profiles focus solely on the top-level `Galley::layout_inline` subtree, also don't compare sample count as the duration of these tests was completely arbitrary) egui_demo_lib `*text_layout*` benches: <img width="791" height="461" alt="image" src="https://github.com/user-attachments/assets/4f97ce84-2768-4876-9488-d42f8f358ed1" /> * [X] I have followed the instructions in the PR template (As usual, the tests fail for me even on master but the failures on master and with these changes seem the same :))
What
(written by @emilk)
When editing long text (thousands of line), egui would previously re-layout the entire text on each edit. This could be slow.
With this PR, we instead split the text into paragraphs (split on
\n) and then cache each such paragraph. When editing text then, only the changed paragraph needs to be laid out again.Still, there is overhead from splitting the text, hashing each paragraph, and then joining the results, so the runtime complexity is still O(N).
In our benchmark, editing a 2000 line string goes from ~8ms to ~300 ms, a speedup of ~25x.
In the future, we could also consider laying out each paragraph in parallel, to speed up the initial layout of the text.
Details
This is an
almost completeimplementation of the approach described by emilk in this comment, excluding CoW semantics forLayoutJob(but including them forRow).It supersedes the previous unsuccessful attempt here: #4000.
Draft because:
Currently individual rows will haveends_with_newlinealways set to false.This breaks selection with Ctrl+A (and probably many other things)
The whole block for doing the splitting and merging should probably become a function (I'll do that later).I haven't run the check script, the tests, and haven't made sure all of the examples build (although I assume they probably don't rely on Galley internals).Layout is sometimes incorrect (missing empty lines, wrapping sometimes makes text overlap).Rows. Also this requires that we're fine making these very breaking changes.It does significantly improve the performance of rendering large blocks of text (if they have many newlines), this is the test program I used to test it (adapted from #3086):
code
I think the way to proceed would be to make a new type, something like
PositionedRow, that would wrap anArc<Row>but have a separateposand(that would meanends_with_newlineRowonly holds asizeinstead of arect). This type would of course have getters that would allow you to easily get aRectfrom it and probably aDerefto the underlyingRow.I haven't done this yet because I wanted to get some opinions whether this would be an acceptable API first.This is now implemented, but of course I'm still open to discussion about this approach and whether it's what we want to do.Breaking changes (currently):
Galley::rowsfield has a different type.PlacedRowwrapper forRow.Rownow uses a coordinate system relative to itself instead of theGalley.TextEdit#3086