Skip to content

perf: Replace std::list with std::vector in text layout#802

Closed
BlindBat wants to merge 1 commit intocrosspoint-reader:masterfrom
BlindBat:perf/list-to-vector
Closed

perf: Replace std::list with std::vector in text layout#802
BlindBat wants to merge 1 commit intocrosspoint-reader:masterfrom
BlindBat:perf/list-to-vector

Conversation

@BlindBat
Copy link
Contributor

@BlindBat BlindBat commented Feb 9, 2026

Summary

  • Replace std::list with std::vector for word storage in TextBlock and ParsedText
  • Use index-based access (words[i]) instead of iterator advancement (std::advance(it, n))
  • Remove the separate continuesVec copy that was built from wordContinues for O(1) access — now unnecessary since std::vector<bool> already provides O(1) indexing

Why

std::list allocates each node individually on the heap with 16 bytes of prev/next pointer overhead per node. For text layout with many small words, this means:

  • Scattered heap allocations instead of contiguous memory
  • Poor cache locality during iteration (each node can be anywhere in memory)
  • Per-node malloc/free overhead during construction and destruction

std::vector stores elements contiguously, giving better cache performance during the tight rendering and layout loops. The extractLine function also benefits: list splice was O(1) but required maintaining three parallel iterators, while vector range construction with move iterators is simpler and still efficient for the small line-sized chunks involved.

Files changed

  • lib/Epub/Epub/blocks/TextBlock.h / .cpp
  • lib/Epub/Epub/ParsedText.h / .cpp

AI Usage

YES

Test plan

  • Open an EPUB with mixed formatting (bold, italic, underline) — verify text renders correctly
  • Open a book with justified text — verify word spacing is correct
  • Open a book with hyphenation enabled — verify words break correctly at hyphens
  • Navigate through pages rapidly — verify no rendering glitches or crashes
  • Open a book with long paragraphs — verify text layout matches pre-change behavior

Replace std::list with std::vector for the words, wordStyles,
wordXpos, and wordContinues containers in TextBlock and ParsedText.

Vectors provide contiguous memory layout for better cache locality
and O(1) random access, eliminating per-node heap allocation and
the 16-byte prev/next pointer overhead of doubly-linked list nodes.
The indexed access also removes the need for a separate continuesVec
copy that was previously built from the list for O(1) layout access.
@Uri-Tauber
Copy link
Contributor

Thanks @BlindBat!
Sounds very useful.

Did you verify the render time reduction on the real device? (Serial logs show exact ms for rendering.)

@BlindBat
Copy link
Contributor Author

BlindBat commented Feb 9, 2026

@Uri-Tauber will do. Converting the PR to a draft until the performance tests are done (it requires time because the tests are semi-manual).

@BlindBat BlindBat marked this pull request as draft February 9, 2026 19:48
std::list<bool> lineContinues;
lineContinues.splice(lineContinues.begin(), wordContinues, wordContinues.begin(), wordContinuesEndIt);
// Build line data by moving from the original vectors using index range
std::vector<std::string> lineWords(std::make_move_iterator(words.begin() + lastBreakAt),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the missing piece I couldn't quite figure out when writing the original implementation. I wanted to use vectors, but didn't know about move iterators, so didn't know I could move items without needing to double allocate. That's basically the only reason I was using list, just to use splice.

znelson added a commit to znelson/crosspoint-reader that referenced this pull request Feb 12, 2026
Copy link
Contributor

@znelson znelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes sense to me, and I've been using it for a while on device with no issues.

Copy link
Member

@daveallie daveallie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In testing, there are some issues here. Notably, we are never clearing the empty nodes from the words and wordStyles vectors onces the items are moved, so they only ever grow. This is a problem as we check self->currentTextBlock->size() > 750 (where currentTextBlock->size() is just words.size()) in ChapterHtmlSlimParser to call self->currentTextBlock->layoutAndExtractLines early.

For paragraphs with over 750 words (e.g. I have been using Intermezzo as a test book which has several 800-1000 word paragraphs), this is problematic. Near the end of the paragraph layoutAndExtractLines gets called very frequently, slowing down indexing.

daveallie pushed a commit that referenced this pull request Feb 22, 2026
## Summary

_Revision to @BlindBat's #802. Description comes from the original PR._

- Replace `std::list` with `std::vector` for word storage in `TextBlock`
and `ParsedText`
- Use index-based access (`words[i]`) instead of iterator advancement
(`std::advance(it, n)`)
- Remove the separate `continuesVec` copy that was built from
`wordContinues` for O(1) access — now unnecessary since
`std::vector<bool>` already provides O(1) indexing

## Why

`std::list` allocates each node individually on the heap with 16 bytes
of prev/next pointer overhead per node. For text layout with many small
words, this means:
- Scattered heap allocations instead of contiguous memory
- Poor cache locality during iteration (each node can be anywhere in
memory)
- Per-node malloc/free overhead during construction and destruction

`std::vector` stores elements contiguously, giving better cache
performance during the tight rendering and layout loops. The
`extractLine` function also benefits: list splice was O(1) but required
maintaining three parallel iterators, while vector range construction
with move iterators is simpler and still efficient for the small
line-sized chunks involved.

## Files changed

- `lib/Epub/Epub/blocks/TextBlock.h` / `.cpp`
- `lib/Epub/Epub/ParsedText.h` / `.cpp`

## AI Usage

YES

## Test plan

- [ ] Open an EPUB with mixed formatting (bold, italic, underline) —
verify text renders correctly
- [ ] Open a book with justified text — verify word spacing is correct
- [ ] Open a book with hyphenation enabled — verify words break
correctly at hyphens
- [ ] Navigate through pages rapidly — verify no rendering glitches or
crashes
- [ ] Open a book with long paragraphs — verify text layout matches
pre-change behavior

---------

Co-authored-by: Kuanysh Bekkulov <[email protected]>
@znelson
Copy link
Contributor

znelson commented Feb 22, 2026

Merged from updated #1038, closing this PR.

@znelson znelson closed this Feb 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants