fix: Fix hyphenation and rendering of decomposed characters by jpirnay · Pull Request #1037 · crosspoint-reader/crosspoint-reader

jpirnay · 2026-02-20T14:02:43Z

Summary

This PR fixes decomposed diacritic handling end-to-end:
- Hyphenation: normalize common Latin base+combining sequences to precomposed codepoints before Liang pattern matching, so decomposed words hyphenate correctly
- Rendering: correct combining-mark placement logic so non-spacing marks are attached to the preceding base glyph in normal and rotated text rendering paths, with corresponding text-bounds consistency updates.
- Hyphenation around non breaking space variants have been fixed (and extended)
- Hyphenation of terms that already included of hyphens were fixed to include Liang pattern application (eg "US-Satellitensystem" was exclusively broken at the existing hyphen)

Additional Context

Before

After

Note 1: the hyphenation fix is not a 100% bullet proof implementation. It adds composition of common base+combining sequences (e.g. O + U+0308 -> Ö) during codepoint collection. A complete solution would require implementing proper Unicode normalization (at least NFC, possibly NFKC in specific cases) before hyphenation and rendering, instead of hand-mapping a few combining marks. That was beyond the scope of this fix.
Note 2: the render fix should be universal and not limited to the constraints outlined above: it properly x-centers the compund glyph over the previous one, and it uses at least 1pt of visual distance in y.

Before:

After:

This should resolve the issues described in Minor hyphenation and text flow issues #998

AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? PARTIALLY

coderabbitai · 2026-02-20T14:02:58Z

📝 Walkthrough

Walkthrough

Adds base-glyph-aware combining-mark handling across codepoint collection, font bounds calculation, and rendering: common Western diacritics are precomposed when possible; combining marks are positioned relative to the preceding base glyph (raised) and do not advance cursor or base advance metrics.

Changes

Cohort / File(s)	Summary
Font bounds `lib/EpdFont/EpdFont.cpp`	Adds detection of combining marks and tracks last base glyph (lastBaseX/lastBaseAdvance/lastBaseTop/hasBaseGlyph). Computes glyph positions for combining marks relative to base glyph, updates min/max bounds using base-aware coordinates, and prevents cursor advance for combining marks.
Precomposition / Hyphenation `lib/Epub/Epub/hyphenation/HyphenationCommon.cpp`	Performs in-place NFC-like composition for common Western European combining diacritics: if a combining mark follows a base and a precomposed scalar exists, replaces the previous base scalar and skips emitting the combining mark.
Text rendering / advances `lib/GfxRenderer/GfxRenderer.cpp`	Adds combining-mark handling to drawText/drawTextRotated90CW/getTextAdvanceX: tracks last base glyph state, renders combining glyphs at adjusted (raised) coordinates without advancing cursor, and excludes combining marks from advance width.
Unicode utility `lib/Utf8/Utf8.h`	Adds inline `utf8IsCombiningMark(uint32_t)` to identify combining diacritical marks by codepoint ranges for use across font/rendering/path logic.

Sequence Diagram(s)

sequenceDiagram
    participant Input as TextInput
    participant Collector as HyphenationCommon::collectCodepoints
    participant Font as EpdFont::getTextBounds
    participant Renderer as GfxRenderer::drawText
    participant Output as FrameBuffer

    Input->>Collector: stream codepoints
    alt combining follows base and precompose exists
        Collector-->>Collector: replace previous base with precomposed scalar
    else
        Collector->>Collector: emit codepoint as-is
    end
    Collector->>Font: request glyph metrics & bounds (per glyph)
    Font-->>Font: update lastBaseX/advance/top when base glyph seen
    opt combining mark with base present
        Font-->>Font: compute glyphBaseX/glyphBaseY (raiseBy, MIN_COMBINING_GAP_PX)
    end
    Font->>Renderer: glyph positions & metrics
    Renderer->>Renderer: draw base glyphs (advance cursor)
    opt combining mark
        Renderer->>Renderer: draw combining glyph at adjusted coord (no cursor advance)
    end
    Renderer->>Output: rendered pixels

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

fix: Correct word width and space calculations #963 — Changes getTextAdvanceX/space width logic; strongly related to advance/width handling and combining-mark-aware measurements.

Suggested reviewers

daveallie

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main changes: fixing hyphenation and rendering of decomposed characters, which aligns with the core objectives of this pull request.
Description check	✅ Passed	The pull request description is directly related to the changeset, clearly outlining the fixes for decomposed diacritic handling in hyphenation and rendering across multiple files with before/after examples.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

jpirnay · 2026-02-20T14:03:28Z

Testfile:
hyphenation_demo.zip

…nt-reader into fix-hyphenation

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6cdd362 and 32134b5.

📒 Files selected for processing (2)

lib/EpdFont/EpdFont.cpp
lib/Utf8/Utf8.h

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (3)

lib/Utf8/Utf8.h (1)

13-19: LGTM!

The combining mark ranges are accurate per Unicode standards, and the inline placement is appropriate for this simple predicate. The intentional limitation to common Western diacritics (as noted in the PR) is a reasonable trade-off for the hyphenation use case.

lib/EpdFont/EpdFont.cpp (2)

31-35: LGTM!

Clear naming for the base-glyph tracking state, and constexpr for the gap constant is appropriate.

49-72: LGTM!

The combining mark positioning logic is sound:

Vertical gap enforcement ensures marks don't overlap base glyphs

Horizontal centering over the base glyph's advance width is a reasonable default

Cursor advancement correctly skipped for combining marks

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/EpdFont/EpdFont.cpp`:
- Around line 7-16: Remove the local anonymous-namespace helper bool
isCombiningMark(...) and replace all uses of isCombiningMark in EpdFont.cpp with
the already-included utf8IsCombiningMark from Utf8.h; ensure the file keeps the
`#include` "Utf8.h" and delete the duplicate function definition to avoid
divergence and rely solely on utf8IsCombiningMark wherever combining-mark checks
are performed (e.g., where isCombiningMark was previously called).

lib/EpdFont/EpdFont.cpp

lukestein · 2026-02-20T17:52:23Z

Thank you very much for this! Tagging in our hyphenation expert @osteotek. I've been really hoping we could get the fix that this promises:

Hyphenation of terms that already included of hypehns were fixed to include Liang pattern application (eg "US-Satellitensystem" was exclusively broken at the existing hyphen)

I..e., allowing additional hyphenation of words that also include an explicit hyphen so that the occasional multiply-hyphenated-phrases have more places to get broken. (I use large or extra-large fonts, so I'm particularly sensitive to hyphenation quality.)

osteotek · 2026-02-20T19:32:10Z

Ok, since popular opinion seems to be in favor of breaking already hyphenated words, i'm OK with it too 😃

cbix

I can't comment much on the code (probably could with more time at my hands tho 👀), but this fixes the two issues (#998) from my original epubs:

## Summary * This PR fixes decomposed diacritic handling end-to-end: - Hyphenation: normalize common Latin base+combining sequences to precomposed codepoints before Liang pattern matching, so decomposed words hyphenate correctly - Rendering: correct combining-mark placement logic so non-spacing marks are attached to the preceding base glyph in normal and rotated text rendering paths, with corresponding text-bounds consistency updates. - Hyphenation around non breaking space variants have been fixed (and extended) - Hyphenation of terms that already included of hyphens were fixed to include Liang pattern application (eg "US-Satellitensystem" was *exclusively* broken at the existing hyphen) ## Additional Context * Before <img width="800" height="480" alt="2" src="https://github.com/user-attachments/assets/b9c515c4-ab75-45cc-8b52-f4d86bce519d" /> * After <img width="480" height="800" alt="fix1" src="https://github.com/user-attachments/assets/4999f6a8-f51c-4c0a-b144-f153f77ddb57" /> <img width="800" height="480" alt="fix2" src="https://github.com/user-attachments/assets/7355126b-80c7-441f-b390-4e0897ee3fb6" /> * Note 1: the hyphenation fix is not a 100% bullet proof implementation. It adds composition of *common* base+combining sequences (e.g. O + U+0308 -> Ö) during codepoint collection. A complete solution would require implementing proper Unicode normalization (at least NFC, possibly NFKC in specific cases) before hyphenation and rendering, instead of hand-mapping a few combining marks. That was beyond the scope of this fix. * Note 2: the render fix should be universal and not limited to the constraints outlined above: it properly x-centers the compund glyph over the previous one, and it uses at least 1pt of visual distance in y. Before: <img width="478" height="167" alt="Image" src="https://github.com/user-attachments/assets/f8db60d5-35b1-4477-96d0-5003b4e4a2a1" /> After: <img width="479" height="180" alt="Image" src="https://github.com/user-attachments/assets/1b48ef97-3a77-475a-8522-23f4aca8e904" /> * This should resolve the issues described in #998 --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? _**PARTIALLY**_

jpirnay added 2 commits February 20, 2026 13:26

Combining diacritics

7b3de4e

fix compund character rendering

cd4a24f

jpirnay mentioned this pull request Feb 20, 2026

chore(release): 1.1.0 Release Candidate #992

Merged

jpirnay added 4 commits February 20, 2026 16:32

clang-format

6cdd362

Address build issue

ce7bd3a

Merge branch 'fix-hyphenation' of https://github.com/jpirnay/crosspoi…

9c28f98

…nt-reader into fix-hyphenation

Restore helper

32134b5

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

lib/EpdFont/EpdFont.cpp Outdated Show resolved Hide resolved

Apply various hyphenation improvements / fixes

7d648de

cbix reviewed Feb 20, 2026

View reviewed changes

daveallie approved these changes Feb 22, 2026

View reviewed changes

daveallie merged commit 5f5561b into crosspoint-reader:master Feb 22, 2026
7 checks passed

This was referenced Feb 23, 2026

refactor: Simplify REPLACEMENT_GLYPH fallback #1119

Merged

fix: Use fixed-point fractional x-advance and kerning for better text layout #1168

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Fix hyphenation and rendering of decomposed characters#1037

fix: Fix hyphenation and rendering of decomposed characters#1037
daveallie merged 7 commits intocrosspoint-reader:masterfrom
jpirnay:fix-hyphenation

jpirnay commented Feb 20, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 20, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

jpirnay commented Feb 20, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

lukestein commented Feb 20, 2026

Uh oh!

osteotek commented Feb 20, 2026

Uh oh!

cbix left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

jpirnay commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Additional Context

AI Usage

Uh oh!

coderabbitai bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

jpirnay commented Feb 20, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lukestein commented Feb 20, 2026

Uh oh!

osteotek commented Feb 20, 2026

Uh oh!

cbix left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jpirnay commented Feb 20, 2026 •

edited

Loading

coderabbitai bot commented Feb 20, 2026 •

edited

Loading