fix: Fix hyphenation and rendering of decomposed characters#1037
fix: Fix hyphenation and rendering of decomposed characters#1037daveallie merged 7 commits intocrosspoint-reader:masterfrom
Conversation
📝 WalkthroughWalkthroughAdds base-glyph-aware combining-mark handling across codepoint collection, font bounds calculation, and rendering: common Western diacritics are precomposed when possible; combining marks are positioned relative to the preceding base glyph (raised) and do not advance cursor or base advance metrics. Changes
Sequence Diagram(s)sequenceDiagram
participant Input as TextInput
participant Collector as HyphenationCommon::collectCodepoints
participant Font as EpdFont::getTextBounds
participant Renderer as GfxRenderer::drawText
participant Output as FrameBuffer
Input->>Collector: stream codepoints
alt combining follows base and precompose exists
Collector-->>Collector: replace previous base with precomposed scalar
else
Collector->>Collector: emit codepoint as-is
end
Collector->>Font: request glyph metrics & bounds (per glyph)
Font-->>Font: update lastBaseX/advance/top when base glyph seen
opt combining mark with base present
Font-->>Font: compute glyphBaseX/glyphBaseY (raiseBy, MIN_COMBINING_GAP_PX)
end
Font->>Renderer: glyph positions & metrics
Renderer->>Renderer: draw base glyphs (advance cursor)
opt combining mark
Renderer->>Renderer: draw combining glyph at adjusted coord (no cursor advance)
end
Renderer->>Output: rendered pixels
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
Testfile: |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/EpdFont/EpdFont.cpplib/Utf8/Utf8.h
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (3)
lib/Utf8/Utf8.h (1)
13-19: LGTM!The combining mark ranges are accurate per Unicode standards, and the inline placement is appropriate for this simple predicate. The intentional limitation to common Western diacritics (as noted in the PR) is a reasonable trade-off for the hyphenation use case.
lib/EpdFont/EpdFont.cpp (2)
31-35: LGTM!Clear naming for the base-glyph tracking state, and
constexprfor the gap constant is appropriate.
49-72: LGTM!The combining mark positioning logic is sound:
- Vertical gap enforcement ensures marks don't overlap base glyphs
- Horizontal centering over the base glyph's advance width is a reasonable default
- Cursor advancement correctly skipped for combining marks
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/EpdFont/EpdFont.cpp`:
- Around line 7-16: Remove the local anonymous-namespace helper bool
isCombiningMark(...) and replace all uses of isCombiningMark in EpdFont.cpp with
the already-included utf8IsCombiningMark from Utf8.h; ensure the file keeps the
`#include` "Utf8.h" and delete the duplicate function definition to avoid
divergence and rely solely on utf8IsCombiningMark wherever combining-mark checks
are performed (e.g., where isCombiningMark was previously called).
|
Thank you very much for this! Tagging in our hyphenation expert @osteotek. I've been really hoping we could get the fix that this promises:
I..e., allowing additional hyphenation of words that also include an explicit hyphen so that the occasional multiply-hyphenated-phrases have more places to get broken. (I use large or extra-large fonts, so I'm particularly sensitive to hyphenation quality.) |
|
Ok, since popular opinion seems to be in favor of breaking already hyphenated words, i'm OK with it too 😃 |
## Summary * This PR fixes decomposed diacritic handling end-to-end: - Hyphenation: normalize common Latin base+combining sequences to precomposed codepoints before Liang pattern matching, so decomposed words hyphenate correctly - Rendering: correct combining-mark placement logic so non-spacing marks are attached to the preceding base glyph in normal and rotated text rendering paths, with corresponding text-bounds consistency updates. - Hyphenation around non breaking space variants have been fixed (and extended) - Hyphenation of terms that already included of hyphens were fixed to include Liang pattern application (eg "US-Satellitensystem" was *exclusively* broken at the existing hyphen) ## Additional Context * Before <img width="800" height="480" alt="2" src="https://github.com/user-attachments/assets/b9c515c4-ab75-45cc-8b52-f4d86bce519d" /> * After <img width="480" height="800" alt="fix1" src="https://github.com/user-attachments/assets/4999f6a8-f51c-4c0a-b144-f153f77ddb57" /> <img width="800" height="480" alt="fix2" src="https://github.com/user-attachments/assets/7355126b-80c7-441f-b390-4e0897ee3fb6" /> * Note 1: the hyphenation fix is not a 100% bullet proof implementation. It adds composition of *common* base+combining sequences (e.g. O + U+0308 -> Ö) during codepoint collection. A complete solution would require implementing proper Unicode normalization (at least NFC, possibly NFKC in specific cases) before hyphenation and rendering, instead of hand-mapping a few combining marks. That was beyond the scope of this fix. * Note 2: the render fix should be universal and not limited to the constraints outlined above: it properly x-centers the compund glyph over the previous one, and it uses at least 1pt of visual distance in y. Before: <img width="478" height="167" alt="Image" src="https://github.com/user-attachments/assets/f8db60d5-35b1-4477-96d0-5003b4e4a2a1" /> After: <img width="479" height="180" alt="Image" src="https://github.com/user-attachments/assets/1b48ef97-3a77-475a-8522-23f4aca8e904" /> * This should resolve the issues described in #998 --- ### AI Usage While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it helps set the right context for reviewers. Did you use AI tools to help write this code? _**PARTIALLY**_


Summary
Additional Context
Note 1: the hyphenation fix is not a 100% bullet proof implementation. It adds composition of common base+combining sequences (e.g. O + U+0308 -> Ö) during codepoint collection. A complete solution would require implementing proper Unicode normalization (at least NFC, possibly NFKC in specific cases) before hyphenation and rendering, instead of hand-mapping a few combining marks. That was beyond the scope of this fix.
Note 2: the render fix should be universal and not limited to the constraints outlined above: it properly x-centers the compund glyph over the previous one, and it uses at least 1pt of visual distance in y.
Before:

After:

AI Usage
While CrossPoint doesn't have restrictions on AI tools in contributing, please be transparent about their usage as it
helps set the right context for reviewers.
Did you use AI tools to help write this code? PARTIALLY