Skip to content

fix: Account for nbsp; character as non-breaking space#757

Merged
jonasdiemer merged 9 commits intocrosspoint-reader:masterfrom
jdk2pq:fix/account-for-nbsp-character-as-non-breaking-space
Feb 13, 2026
Merged

fix: Account for nbsp; character as non-breaking space#757
jonasdiemer merged 9 commits intocrosspoint-reader:masterfrom
jdk2pq:fix/account-for-nbsp-character-as-non-breaking-space

Conversation

@jdk2pq
Copy link
Contributor

@jdk2pq jdk2pq commented Feb 7, 2026

Summary

Closes #743.

What is the goal of this PR?

What changes are included?

  • Brought back HTML entity table from previous commit and refactored it to use a static const char * table with linear lookup to reduce heap allocations.
  • Used XML_SetDefaultHandlerExpand in expat to parse out the entities correctly, without needing them defined in DOCTYPE
  • Added handling for   so that the text stays together and doesn't break onto a new line with text separated by an  

Additional Context

  • This supersedes this PR that simply handled nbsp; as whitespace. Instead, we want that character to serve its true purpose and affect the line-breaking algorithm.
  • Updated my test EPUB here with   characters examples at the end of the book

AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? YES, Claude Code

- Added `WordAttach` enum with the 3 kinds of attachments: Normal, Continues (for when tags close and there shouldn't be an extra space), and NonBreaking for non-breaking space characters
- Extended existing logic with `continuesVec` to support using `WordAttach` instead of boolean
@jdk2pq jdk2pq added reader Related to the core reader experience language Related to language or character set labels Feb 7, 2026
@jdk2pq jdk2pq requested review from lukestein and osteotek February 7, 2026 21:31
@lukestein
Copy link
Contributor

Looks good on my books but I'd love if @IjonFryderyk can test or else share an ebook file that caused his problem so we can check.

osteotek
osteotek previously approved these changes Feb 8, 2026
@IjonFryderyk
Copy link

Is there a way to test this locally without flashing to my X4? Sorry, I'm a noob with embedded development - not sure if that's possible with ESP32 firmware.

lukestein
lukestein previously approved these changes Feb 9, 2026
…king-space

* master:
  feat: Add percentage support to CSS properties (crosspoint-reader#738)
  Use GITHUB_REF_NAME over GITHUB_HEAD_REF in release candidate workflow
  Add release candidate workflow
  fix: Allow OTA update from RC build to full release (crosspoint-reader#778)
  fix(ui): Add Back label in KOReader Sync screen (crosspoint-reader#770)
  fix: Add EPUB 3 cover image detection (crosspoint-reader#760)
  feat: A web editor for settings (crosspoint-reader#667)
  feat: add HalStorage (crosspoint-reader#656)
  perf: optimize drawPixel() (crosspoint-reader#748)
  feat: wakeup target detection (crosspoint-reader#731)
  fix: Scrolling page items calculation (crosspoint-reader#716)
  refactor: Rename "Embedded Style" to "Book's Embedded Style" (crosspoint-reader#746)
  feat: optimize fillRectDither (crosspoint-reader#737)
@jdk2pq
Copy link
Contributor Author

jdk2pq commented Feb 9, 2026

Is there a way to test this locally without flashing to my X4? Sorry, I'm a noob with embedded development - not sure if that's possible with ESP32 firmware.

It's all good! Currently, there isn't a great way of testing CrossPoint without flashing on-device. There's some steps being taken in other PRs/forks to create an emulated testing environment, but they're still works in progress and a bit limited right now.

If you're comfortable with flashing a new firmware, you can download the firmware.bin artifact from here and flash it with the flash tool found here, with a fair warning that I would still consider this "pre-release" code, and it may contain bugs (though I've tested it thoroughly on my device, and all's working well from what I can tell).

@IjonFryderyk
Copy link

@osteotek
Copy link
Member

osteotek commented Feb 9, 2026

I've tested this on provided polish epub, and words including nbsp are still merged together
image

@osteotek osteotek dismissed their stale review February 9, 2026 22:38

request for testing

…king-space

* master:
  feat: use natural sort in file browser (crosspoint-reader#722)
  fix: issue if book href are absolute url and not relative to server (crosspoint-reader#741)
  feat: unify navigation handling with system-wide continuous navigation (crosspoint-reader#600)
  feat: Add Italian hyphenation support (crosspoint-reader#584)
  feat: Add percentage support to CSS properties (crosspoint-reader#738)
  Use GITHUB_REF_NAME over GITHUB_HEAD_REF in release candidate workflow
  Move release candidate workflow to manual dispatch
  fix: Allow OTA update from RC build to full release (crosspoint-reader#778)
  refactor: Rename "Embedded Style" to "Book's Embedded Style" (crosspoint-reader#746)
  perf: optimize drawPixel() (crosspoint-reader#748)
  feat: wakeup target detection (crosspoint-reader#731)
  fix: Scrolling page items calculation (crosspoint-reader#716)
  feat: optimize fillRectDither (crosspoint-reader#737)
  fix: increase lyra sideButtonHintsWidth to 30 (crosspoint-reader#727)
  fix: Remove separations after style changes (crosspoint-reader#720)
  fix: Lag before displaying covers on home screen (crosspoint-reader#721)
  feat: Add Settings for toggling CSS on or off (crosspoint-reader#717)
  Use GITHUB_HEAD_REF
  release: 1.0.0
…king-space

* master:
  fix: Reduce MIN_SIZE_FOR_POPUP to 10KB (crosspoint-reader#809)
  docs: Update USER_GUIDE.md (crosspoint-reader#817)
  fix: Prevent sleeping when in OPDS browser / downloading books (crosspoint-reader#818)
  feat: Extend python debugging monitor functionality (keyword filter / suppress) (crosspoint-reader#810)
  docs: Update USER_GUIDE.md (crosspoint-reader#808)
  feat: Connect to last wifi by default (crosspoint-reader#752)
- Brought back HTML entity table from previous commit and refactored it to use a static const char * table with linear lookup to reduce heap allocations.
- Used `XML_SetDefaultHandlerExpand` in expat to parse out the entities correctly, without needing them defined in DOCTYPE
- Added handling for ` ` so that the text stays together and doesn't break onto a new line with text separated by an ` `
@jdk2pq
Copy link
Contributor Author

jdk2pq commented Feb 11, 2026

@osteotek @IjonFryderyk I've just pushed up a new commit that I've thoroughly tested with the Polish EPUB and my own test EPUBs. Let me know if this does the trick to fix things!

@daveallie I followed your suggestion of bringing back the HTML entity parsing code, and I made some minor changes to reduce heap allocations, since I was hitting heap issues when loading the Polish EPUB (chapters are ~200+ pages on my device, and it would hit an error when it was processing all the chapter's pages at the beginning of a cacheless load).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restores HTML entity handling in the EPUB chapter HTML parser to correctly interpret entities like   and treat non-breaking spaces as unbreakable word joins in the line-breaking algorithm (addressing issue #743).

Changes:

  • Added an Expat XML_SetDefaultHandlerExpand handler to intercept and expand HTML entity references during parsing.
  • Introduced an HTML entity lookup table (lookupHtmlEntity) and integrated it into chapter parsing.
  • Implemented explicit handling of U+00A0 (NBSP) in text parsing and adjusted word-width measurement for standalone space tokens.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h Declares the new Expat default handler callback.
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp Expands HTML entities via Expat default handler and adds NBSP-aware word continuation logic.
lib/Epub/Epub/htmlEntities.h Declares entity replacement/lookup helpers.
lib/Epub/Epub/htmlEntities.cpp Adds entity lookup table and helper implementations.
lib/Epub/Epub/ParsedText.cpp Special-cases measuring width for a single-space “word” token.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@osteotek
Copy link
Member

@jdk2pq I've tested this on the polish epub, seems to be working good, thank you. Some of the Copilot suggestions seems valid to me, but ultimately up to if you want to address them

…king-space

* master:
  fix: chore: make all debug messages uniform (crosspoint-reader#825)
  fix: Show "Back" in file browser if not in root, "Home" otherwise. (crosspoint-reader#822)
  fix: Manually trigger GPIO update in File Browser mode (crosspoint-reader#819)
…p unused code in htmlEntities, and add proper em, en, and thin space instead of empty strings
@jdk2pq jdk2pq requested review from a team, lukestein and osteotek February 12, 2026 03:07
@osteotek osteotek requested a review from a team February 13, 2026 10:46
Comment on lines +69 to +70
const size_t keyLen = strlen(key);
if (static_cast<size_t>(len) == keyLen && memcmp(entity, key, keyLen) == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: maybe use strncmp

const char* value;
};

static const EntityPair ENTITY_LOOKUP[] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible improvement for future: maybe compile this list as a tree structure, to avoid linear search and save space

@jonasdiemer jonasdiemer merged commit 6e51afb into crosspoint-reader:master Feb 13, 2026
6 checks passed
@jonasdiemer
Copy link
Contributor

Merged, as the open topics can be done in future PRs.

jdk2pq added a commit to jdk2pq/crosspoint-reader that referenced this pull request Feb 14, 2026
* master:
  feat: use pre-compressed HTML pages (crosspoint-reader#861)
  docs: Add requirement device be on when flashing (crosspoint-reader#877)
  fix: Account for `nbsp;` character as non-breaking space (crosspoint-reader#757)
  feat: Add central logging pragma (crosspoint-reader#843)
Unintendedsideeffects pushed a commit to Unintendedsideeffects/crosspoint-reader that referenced this pull request Feb 17, 2026
…reader#757)

## Summary

Closes crosspoint-reader#743.

**What is the goal of this PR?**

- Add back handling for HTML entities in expat. This was originally part
of the code that got removed
[here](crosspoint-reader#274)
- Handle `&nbsp;` characters to resolve issue crosspoint-reader#743 

**What changes are included?**

- Brought back HTML entity table from previous commit and refactored it
to use a static const char * table with linear lookup to reduce heap
allocations.
- Used `XML_SetDefaultHandlerExpand` in expat to parse out the entities
correctly, without needing them defined in DOCTYPE
- Added handling for `&nbsp;` so that the text stays together and
doesn't break onto a new line with text separated by an `&nbsp;`

## Additional Context

- This supersedes [this
PR](crosspoint-reader#751)
that simply handled `nbsp;` as whitespace. Instead, we want that
character to serve its true purpose and affect the line-breaking
algorithm.
- Updated my test EPUB [here](https://github.com/jdk2pq/css-test-epub)
with `&nbsp;` characters examples at the end of the book

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? _**YES**_, Claude Code
saslv pushed a commit to saslv/crosspoint-reader that referenced this pull request Feb 19, 2026
…reader#757)

## Summary

Closes crosspoint-reader#743.

**What is the goal of this PR?**

- Add back handling for HTML entities in expat. This was originally part
of the code that got removed
[here](crosspoint-reader#274)
- Handle `&nbsp;` characters to resolve issue crosspoint-reader#743 

**What changes are included?**

- Brought back HTML entity table from previous commit and refactored it
to use a static const char * table with linear lookup to reduce heap
allocations.
- Used `XML_SetDefaultHandlerExpand` in expat to parse out the entities
correctly, without needing them defined in DOCTYPE
- Added handling for `&nbsp;` so that the text stays together and
doesn't break onto a new line with text separated by an `&nbsp;`

## Additional Context

- This supersedes [this
PR](crosspoint-reader#751)
that simply handled `nbsp;` as whitespace. Instead, we want that
character to serve its true purpose and affect the line-breaking
algorithm.
- Updated my test EPUB [here](https://github.com/jdk2pq/css-test-epub)
with `&nbsp;` characters examples at the end of the book

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? _**YES**_, Claude Code
el pushed a commit to el/crosspoint-reader that referenced this pull request Feb 19, 2026
…reader#757)

## Summary

Closes crosspoint-reader#743.

**What is the goal of this PR?**

- Add back handling for HTML entities in expat. This was originally part
of the code that got removed
[here](crosspoint-reader#274)
- Handle `&nbsp;` characters to resolve issue crosspoint-reader#743 

**What changes are included?**

- Brought back HTML entity table from previous commit and refactored it
to use a static const char * table with linear lookup to reduce heap
allocations.
- Used `XML_SetDefaultHandlerExpand` in expat to parse out the entities
correctly, without needing them defined in DOCTYPE
- Added handling for `&nbsp;` so that the text stays together and
doesn't break onto a new line with text separated by an `&nbsp;`

## Additional Context

- This supersedes [this
PR](crosspoint-reader#751)
that simply handled `nbsp;` as whitespace. Instead, we want that
character to serve its true purpose and affect the line-breaking
algorithm.
- Updated my test EPUB [here](https://github.com/jdk2pq/css-test-epub)
with `&nbsp;` characters examples at the end of the book

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? _**YES**_, Claude Code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

language Related to language or character set reader Related to the core reader experience

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Single-character prepositions merge with following words in some EPUB files

7 participants