Skip to content

feat: Add CSS parsing and CSS support in EPUBs#411

Merged
daveallie merged 26 commits intocrosspoint-reader:masterfrom
jdk2pq:feature/add-epub-css-parsing
Feb 5, 2026
Merged

feat: Add CSS parsing and CSS support in EPUBs#411
daveallie merged 26 commits intocrosspoint-reader:masterfrom
jdk2pq:feature/add-epub-css-parsing

Conversation

@jdk2pq
Copy link
Contributor

@jdk2pq jdk2pq commented Jan 17, 2026

Summary

  • What is the goal of this PR?
  • Adds basic CSS parsing to EPUBs and determine the CSS rules when rendering to the screen so that text is styled correctly. Currently supports bold, underline, italics, margin, padding, and text alignment

Additional Context

  • My main reason for wanting this is that the book I'm currently reading, Carl's Doomsday Scenario (2nd in the Dungeon Crawler Carl series), relies a lot on styled text for telling parts of the story. When text is bolded, it's supposed to be a message that's rendered "on-screen" in the story. When characters are "chatting" with each other, the text is bolded and their names are underlined. Plus, normal emphasis is provided with italicizing words here and there. So, this greatly improves my experience reading this book on the Xteink, and I figured it was useful enough for others too.
  • For transparency: I'm a software engineer, but I'm mostly frontend and TypeScript/JavaScript. It's been years since I did any C/C++, so I would not be surprised if I'm doing something dumb along the way in this code. Please don't hesitate to ask for changes if something looks off. I heavily relied on Claude Code for help, and I had a lot of inspiration from how microreader achieves their CSS parsing and styling. I did give this as good of a code review as I could and went through everything, and it works on my machine 😄

Before

IMG_6271
IMG_6272

After

IMG_6268
IMG_6269


AI Usage

Did you use AI tools to help write this code? YES, Claude Code

jdk2pq and others added 7 commits January 19, 2026 22:37
* origin:
  fix: truncate chapter names that are too long (crosspoint-reader#422)
  feat: dict based Hyphenation (crosspoint-reader#305)
  fix: render U+FFFD replacement character instead of ? (crosspoint-reader#366)
  fix: Invert colors on home screen cover overlay when recent book is selected (crosspoint-reader#390)
  Adds KOReader Sync support (crosspoint-reader#232)
  feat: Change keyboard "caps" to "shift" & Wrap Keyboard (crosspoint-reader#377)
  fix: XTC 1-bit thumb BMP polarity inversion (crosspoint-reader#373)
* master:
  chore: Cut release 0.15.0
  fix: OPDS browser OOM (crosspoint-reader#403)
  docs: Add detailed webserver documentation (crosspoint-reader#446)
  feat: invalidate cache on web uploads and opds downloads and add Clear Cache action (crosspoint-reader#393)
  fix: hard reset via RTS pin after flashing firmware (crosspoint-reader#437)
  fix: Skip negative screen coordinates only after we read the bitmap row. (crosspoint-reader#431)
  Reclaim space if we don't show battery Percentage (crosspoint-reader#352)
  feat: Include superscripts and subscripts in fonts (crosspoint-reader#463)
  My Library: Tab bar w/ Recent Books + File Browser (crosspoint-reader#250)
  feat: adding categories to settings screen (crosspoint-reader#331)
* master: (33 commits)
  feat: add HalDisplay and HalGPIO (crosspoint-reader#522)
  feat: Display epub metadata on Recents (crosspoint-reader#511)
  chore: Cut release 0.16.0
  fix: Correctly render italics on image alt placeholders (crosspoint-reader#569)
  chore: .gitignore: add compile_commands.json & .cache (crosspoint-reader#568)
  fix: Render keyboard entry over multiple lines (crosspoint-reader#567)
  fix: missing front layout in mapLabels() (crosspoint-reader#564)
  refactor: Re-work for OTA feature (crosspoint-reader#509)
  perf: optimize large EPUB indexing from O(n^2) to O(n) (crosspoint-reader#458)
  feat: Add Spanish hyphenation support (crosspoint-reader#558)
  feat: Add support to B&W filters to image covers (crosspoint-reader#476)
  feat(ux): page turning on button pressed if long-press chapter skip is disabled (crosspoint-reader#451)
  feat: Add status bar option "Full w/ Progress Bar" (crosspoint-reader#438)
  fix: Validate settings on read. (crosspoint-reader#492)
  fix: rotate origin in drawImage (crosspoint-reader#557)
  feat: Extract author from XTC/XTCH files (crosspoint-reader#563)
  fix: add txt books to recent tab (crosspoint-reader#526)
  docs: add font generation commands to builtin font headers (crosspoint-reader#547)
  docs: Update README with supported languages for EPUB  (crosspoint-reader#530)
  fix: Fix KOReader document md5 calculation for binary matching progress sync (crosspoint-reader#529)
  ...
@roceb
Copy link

roceb commented Jan 28, 2026

Can you fix the conflicts on this PR.

@jdk2pq
Copy link
Contributor Author

jdk2pq commented Jan 28, 2026

Can you fix the conflicts on this PR.

Was in the process of working on that when I got this notification! It should be fixed up now

@ttil
Copy link

ttil commented Jan 29, 2026

@jdk2pq Thank you for this PR. The handling of the paragraph indentation seems to be off.
text-indent: 0; results in an indentation with EmSpace character via the fallback logic. But something like text-indent: 1.2em; is not indented at all.
While I have a fix for this, there are different ways to address this. So I'll let you decide how you want to handle it in the PR.

- margin, padding, and text-indent now all support ems, rems, and px values
- shorthand margin/padding CSS is also supported
- margin/padding/indent values of 0 should no longer erroneously produce additional spacing
@jdk2pq
Copy link
Contributor Author

jdk2pq commented Jan 30, 2026

@jdk2pq Thank you for this PR. The handling of the paragraph indentation seems to be off. text-indent: 0; results in an indentation with EmSpace character via the fallback logic. But something like text-indent: 1.2em; is not indented at all. While I have a fix for this, there are different ways to address this. So I'll let you decide how you want to handle it in the PR.

Thanks for testing this out @ttil ! I just pushed up a commit that should fix that up. There's some additional fixes and improvements around margin and padding as well.

I had Claude help me out with generating a test EPUB for all the supported CSS, and it's rendering correctly on my device as far as I can tell. Here's a repo with the code and file: https://github.com/jdk2pq/css-test-epub in case anyone else wants to test it out!

@jdk2pq
Copy link
Contributor Author

jdk2pq commented Jan 30, 2026

Additionally, here's another good test EPUB from the MobileRead wiki https://wiki.mobileread.com/wiki/EPub_Reader_Test

Their test EPUB tests out far more CSS properties than are currently supported by this PR, but the text renders correctly and the CSS properties that are supported also work as expected on my device.

Copy link

@ttil ttil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've picked up the latest changes and it looks great so far on my books! Thank you so much for driving this. Would love to see this merged.

static CssFontStyle interpretFontStyle(const std::string& val);
static CssFontWeight interpretFontWeight(const std::string& val);
static CssTextDecoration interpretDecoration(const std::string& val);
static float interpretLength(const std::string& val, float emSize = 16.0f);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider increasing this size. To my eye, 32 looks closer to what I see in other EPUB viewers. But I also recognize that this might be a matter of preference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ttil! I updated this to use the current font selection to determine the line height, and then we'll use that as the default emSize instead of assigning it to 16 by default. That should help with dynamically scaling the length if people choose a smaller/larger text size as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked it out and it looks great.

Copy link
Member

@daveallie daveallie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @jdk2pq! Left some comments inline, but overall looking really promising!

Comment on lines +398 to +401
// Count gaps: each word after the first creates a gap, unless it's attaching punctuation
if (wordIdx > 0 && !isAttachingPunctuationWord(*countWordIt)) {
actualGapCount++;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please split this one out for some testing, I believe it'll address some of #182. However there are likely some wider changes which could be done to avoid adding spacing after inline tags.

bool isCssWhitespace(const char c) { return c == ' ' || c == '\t' || c == '\n' || c == '\r' || c == '\f'; }

// Read entire file into string (with size limit)
std::string readFileContent(FsFile& file) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be done in a future PR but as CSS files are linear, then end of style blocks or rule lines should be enough to partially process the content, meaning we could process it in chunks without loading the whole file.

Comment on lines +125 to +131
while (bodyEnd < css.size() && depth > 0) {
if (css[bodyEnd] == '{')
++depth;
else if (css[bodyEnd] == '}')
--depth;
++bodyEnd;
}
Copy link
Member

@daveallie daveallie Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a little sketchy if there are brace characters in string literals in the CSS values, a similar thing exists in your parseDeclarations implementation. I think this is pretty ok for now as it's relatively defensively coded, but may need a bit of a refactor where we better tokenize rules as part of the parsing.

* master:
  feat: Debugging monitor script (crosspoint-reader#555)
  fix: truncating chapter titles using UTF-8 safe function (crosspoint-reader#599)
  fix: don't wake up after USB connect (crosspoint-reader#644)
  Revert "fix: don't wake up after USB connect" (crosspoint-reader#643)
  fix: custom sleep not showing image at index 0 (crosspoint-reader#639)
  docs: Update USER_GUIDE.md (crosspoint-reader#625)
  fix: Hide button hints in landscape CW mode (crosspoint-reader#637)
  fix: WiFi error screen text clarifications (crosspoint-reader#612)
  fix: don't wake up after USB connect (crosspoint-reader#576)
  feat(ui): change popup logic (crosspoint-reader#442)
  feat: Add reading menu and delete cache function (crosspoint-reader#433)
* master:
  fix: webserver folder creation regex change (crosspoint-reader#653)
@jdk2pq
Copy link
Contributor Author

jdk2pq commented Feb 3, 2026

Awesome work @jdk2pq! Left some comments inline, but overall looking really promising!

@daveallie Thank you for the thorough review! I've addressed everything from the review, and hopefully simplified things along the way too, and I'll plan to put up another PR for the spurious spacing issue later this week.

@jdk2pq jdk2pq requested a review from daveallie February 3, 2026 04:06
The function measures the advance width of arbitrary text (specifically
em-space prefix), not just indentation. getTextAdvanceX better reflects
its actual purpose.
Convert Style enum to true bitflags by adding UNDERLINE=4. Update getFont()
to use bitwise operations instead of equality checks, allowing styles like
BOLD|UNDERLINE to work correctly.

This is preparation for encoding underline state directly in the Style
rather than tracking it separately.
- TextAlign enum → CssTextAlign (reordered to match settings: Justify=0, Left=1, Center=2, Right=3)
- alignment → textAlign
- indent → textIndent
- decoration → textDecoration
- Update CssPropertyFlags field names to match
- Remove TextAlign::None; default to CssTextAlign::Left

This aligns internal naming with actual CSS property names for clarity.
- Add alignment and textAlignDefined fields to BlockStyle
- Add getCombinedBlockStyle(child) for merging parent/child styles
- Add static fromCssStyle(cssStyle, emSize, paragraphAlignment) factory

These methods centralize the CSS→BlockStyle conversion logic (previously
duplicated in createBlockStyleFromCss) and provide a clean API for
handling nested block element style inheritance.
Major consolidation of styling infrastructure:

- Remove TextBlock::Style enum (JUSTIFIED, LEFT_ALIGN, etc.)
  Alignment is now stored in BlockStyle.alignment using CssTextAlign

- Remove wordUnderlines list from TextBlock and ParsedText
  Underline state is now encoded in EpdFontFamily::Style via UNDERLINE bitflag

- Use BlockStyle::fromCssStyle() and getCombinedBlockStyle() in parser
  Removes duplicated createBlockStyleFromCss() and mergeBlockStyles()

- Simplify text block rendering to check style bitflag for underlines

- Revert spurious spaces handling (isAttachingPunctuation logic)
  The actualGapCount approach had issues; using standard word gaps

This reduces code duplication and simplifies the style inheritance model.
Remove intermediary variables (top, right, bottom, left) in margin and
padding shorthand parsing. Directly assign to style fields and reference
previously assigned values for defaulting logic.

No functional change - purely code simplification.
Add saveToCache() and loadFromCache() methods to CssParser for persisting
parsed CSS rules to disk. The cache format includes:
- Version byte for cache invalidation
- Rule count
- For each rule: length-prefixed selector string + CssStyle fields

This allows skipping CSS file parsing on subsequent book opens by loading
pre-parsed rules from cache.
- Add cssFiles member to Epub class (moved from BookMetadataCache)
- Add getCssRulesCache() and loadCssRulesFromCache() methods
- Update parseCssFiles() to save parsed rules to cache
- Try loading from css_rules.cache before parsing CSS files
- Add skipLoadingCss parameter to Epub::load() for performance
- Remove cssFiles from BookMetadataCache (no longer needed)
- Revert BookMetadataCache version to 5 (pre-CSS-files format)

When loading an EPUB:
1. Try to load cached CSS rules first
2. If cache miss, parse CSS files and save to cache
3. If skipLoadingCss=true, skip CSS entirely (for cover display)
Both activities only need book metadata (title, author) and cover image.
Pass skipLoadingCss=true to Epub::load() to avoid unnecessary CSS
parsing and caching operations.
* master:
  fix: Correct `debugging_monitor.py` script instructions (crosspoint-reader#676)
  fix: Correct instruction text to match actual button text (crosspoint-reader#672)
  fix: Increase network SSID display length (crosspoint-reader#670)
@jdk2pq jdk2pq force-pushed the feature/add-epub-css-parsing branch from f88fbe3 to 2f8a40d Compare February 4, 2026 01:21
@jdk2pq
Copy link
Contributor Author

jdk2pq commented Feb 4, 2026

Coming back to this today, I felt like my singular giant commit addressing the feedback was a bit challenging to read through all at once, so I broke it up into smaller commits and force-pushed over top of my changes from yesterday. Exact same content and changes from yesterday for anyone who was partially through reviewing, but now it's hopefully a bit easier to read through.

Copy link
Member

@daveallie daveallie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks!

@daveallie daveallie merged commit 2cf799f into crosspoint-reader:master Feb 5, 2026
1 check passed
@daveallie
Copy link
Member

Thanks for your contribution here @jdk2pq, I've sent you an invite to add you to the maintainers team if you're interested in helping out with directing the firmware and managing PRs and issues.

jdk2pq added a commit to jdk2pq/crosspoint-reader that referenced this pull request Feb 5, 2026
* master:
  feat: add shift lock to KeyboardEntryActivity (crosspoint-reader#513)
  feat: rename and move in file manager (crosspoint-reader#630)
  feat: Implement fix for sunlight fading issue (crosspoint-reader#603)
  chore: Add PR title check on sync (crosspoint-reader#698)
  feat: Go To Position for epubs (crosspoint-reader#666)
  feat: Calibre Web Automated (CWA) koreader sync server support (crosspoint-reader#594)
  chore: Add CI check job to consolidate status (crosspoint-reader#696)
  chore: CI Build Summary - firmware stats, firmware artifact (crosspoint-reader#601)
  feat: quick rotate option in epub reader menu (crosspoint-reader#685)
  feat(settings): add "Cover + Custom" sleep screen mode (crosspoint-reader#582)
  fix: Artifacts on Thumb on Home Screen (crosspoint-reader#662)
  feat: holding back button while booting, boots to home screen as a mean of escaping boot loop (crosspoint-reader#587)
  docs: Add small SCOPE.md and GOVERNANCE.md documents (crosspoint-reader#640)
  feat: front button remapper (crosspoint-reader#664)
  feat: UI themes, Lyra (crosspoint-reader#528)
  feat: Add CSS parsing and CSS support in EPUBs (crosspoint-reader#411)
  fix: move http upload state to heap (crosspoint-reader#657)
daveallie pushed a commit that referenced this pull request Feb 5, 2026
Fixes issue #182

## Summary

**What is the goal of this PR?** 
When inline styles change mid-paragraph, words like periods, commas, and
quotes could end up as separate tokens. The justified text algorithm was
treating these as regular words, adding space before them.

**What changes are included?**

Now tracks which words are "attaching punctuation" (., , ! ? ; : " ' and
smart quotes) and excludes them from gap counting. These punctuation
marks attach directly to the preceding word without spacing.

## Additional Context

This is split out from code in #411 to address this comment
#411 (comment)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? _**YES**_, Claude Code
lukestein pushed a commit to lukestein/crosspoint-reader that referenced this pull request Feb 5, 2026
…-reader#694)

Fixes issue crosspoint-reader#182

## Summary

**What is the goal of this PR?** 
When inline styles change mid-paragraph, words like periods, commas, and
quotes could end up as separate tokens. The justified text algorithm was
treating these as regular words, adding space before them.

**What changes are included?**

Now tracks which words are "attaching punctuation" (., , ! ? ; : " ' and
smart quotes) and excludes them from gap counting. These punctuation
marks attach directly to the preceding word without spacing.

## Additional Context

This is split out from code in crosspoint-reader#411 to address this comment
crosspoint-reader#411 (comment)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? _**YES**_, Claude Code
lukestein pushed a commit to lukestein/crosspoint-reader that referenced this pull request Feb 5, 2026
…-reader#694)

Fixes issue crosspoint-reader#182

## Summary

**What is the goal of this PR?** 
When inline styles change mid-paragraph, words like periods, commas, and
quotes could end up as separate tokens. The justified text algorithm was
treating these as regular words, adding space before them.

**What changes are included?**

Now tracks which words are "attaching punctuation" (., , ! ? ; : " ' and
smart quotes) and excludes them from gap counting. These punctuation
marks attach directly to the preceding word without spacing.

## Additional Context

This is split out from code in crosspoint-reader#411 to address this comment
crosspoint-reader#411 (comment)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? _**YES**_, Claude Code
lukestein pushed a commit to lukestein/crosspoint-reader that referenced this pull request Feb 5, 2026
…-reader#694)

Fixes issue crosspoint-reader#182

## Summary

**What is the goal of this PR?** 
When inline styles change mid-paragraph, words like periods, commas, and
quotes could end up as separate tokens. The justified text algorithm was
treating these as regular words, adding space before them.

**What changes are included?**

Now tracks which words are "attaching punctuation" (., , ! ? ; : " ' and
smart quotes) and excludes them from gap counting. These punctuation
marks attach directly to the preceding word without spacing.

## Additional Context

This is split out from code in crosspoint-reader#411 to address this comment
crosspoint-reader#411 (comment)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? _**YES**_, Claude Code
lukestein pushed a commit to lukestein/crosspoint-reader that referenced this pull request Feb 5, 2026
…-reader#694)

Fixes issue crosspoint-reader#182

## Summary

**What is the goal of this PR?** 
When inline styles change mid-paragraph, words like periods, commas, and
quotes could end up as separate tokens. The justified text algorithm was
treating these as regular words, adding space before them.

**What changes are included?**

Now tracks which words are "attaching punctuation" (., , ! ? ; : " ' and
smart quotes) and excludes them from gap counting. These punctuation
marks attach directly to the preceding word without spacing.

## Additional Context

This is split out from code in crosspoint-reader#411 to address this comment
crosspoint-reader#411 (comment)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? _**YES**_, Claude Code
Unintendedsideeffects pushed a commit to Unintendedsideeffects/crosspoint-reader that referenced this pull request Feb 17, 2026
## Summary

* **What is the goal of this PR?**

- Adds basic CSS parsing to EPUBs and determine the CSS rules when
rendering to the screen so that text is styled correctly. Currently
supports bold, underline, italics, margin, padding, and text alignment

## Additional Context

- My main reason for wanting this is that the book I'm currently
reading, Carl's Doomsday Scenario (2nd in the Dungeon Crawler Carl
series), relies _a lot_ on styled text for telling parts of the story.
When text is bolded, it's supposed to be a message that's rendered
"on-screen" in the story. When characters are "chatting" with each
other, the text is bolded and their names are underlined. Plus, normal
emphasis is provided with italicizing words here and there. So, this
greatly improves my experience reading this book on the Xteink, and I
figured it was useful enough for others too.
- For transparency: I'm a software engineer, but I'm mostly frontend and
TypeScript/JavaScript. It's been _years_ since I did any C/C++, so I
would not be surprised if I'm doing something dumb along the way in this
code. Please don't hesitate to ask for changes if something looks off. I
heavily relied on Claude Code for help, and I had a lot of inspiration
from how [microreader](https://github.com/CidVonHighwind/microreader)
achieves their CSS parsing and styling. I did give this as good of a
code review as I could and went through everything, and _it works on my
machine_ 😄

### Before

![IMG_6271](https://github.com/user-attachments/assets/dba7554d-efb6-4d13-88bc-8b83cd1fc615)

![IMG_6272](https://github.com/user-attachments/assets/61ba2de0-87c9-4f39-956f-013da4fe20a4)

### After

![IMG_6268](https://github.com/user-attachments/assets/ebe11796-cca9-4a46-b9c7-0709c7932818)

![IMG_6269](https://github.com/user-attachments/assets/e89c33dc-ff47-4bb7-855e-863fe44b3202)

---

### AI Usage

Did you use AI tools to help write this code? **YES**, Claude Code
Unintendedsideeffects pushed a commit to Unintendedsideeffects/crosspoint-reader that referenced this pull request Feb 17, 2026
…-reader#694)

Fixes issue crosspoint-reader#182

## Summary

**What is the goal of this PR?** 
When inline styles change mid-paragraph, words like periods, commas, and
quotes could end up as separate tokens. The justified text algorithm was
treating these as regular words, adding space before them.

**What changes are included?**

Now tracks which words are "attaching punctuation" (., , ! ? ; : " ' and
smart quotes) and excludes them from gap counting. These punctuation
marks attach directly to the preceding word without spacing.

## Additional Context

This is split out from code in crosspoint-reader#411 to address this comment
crosspoint-reader#411 (comment)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? _**YES**_, Claude Code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants