fix: add CJK Extension H/I to isCJK and fix pre-wrap fast path#33
Open
xr843 wants to merge 1 commit intochenglou:mainfrom
Open
fix: add CJK Extension H/I to isCJK and fix pre-wrap fast path#33xr843 wants to merge 1 commit intochenglou:mainfrom
xr843 wants to merge 1 commit intochenglou:mainfrom
Conversation
Two small fixes: 1. Add missing CJK Unified Ideographs Extension H (U+31350–U+323AF, Unicode 15.0) and Extension I (U+2EBF0–U+2F7FF, Unicode 15.1) to `isCJK()`. Without these ranges, characters from these blocks do not get per-character CJK line breaking or kinsoku rules. This aligns with the documented requirement that "later extension blocks must still hit the CJK path." 2. Remove a no-op `.replace(/\r\n/g, '\n')` on the fast path of `normalizeWhitespacePreWrap()`. When the guard `!/[\r\f]/` is true, no `\r` is present, so `\r\n` cannot match either—the replace always returns the input unchanged.
abnow-dev
reviewed
Apr 1, 2026
| (c >= 0x2B740 && c <= 0x2B81F) || | ||
| (c >= 0x2B820 && c <= 0x2CEAF) || | ||
| (c >= 0x2CEB0 && c <= 0x2EBEF) || | ||
| (c >= 0x2EBF0 && c <= 0x2F7FF) || |
There was a problem hiding this comment.
Hey, nice catch on the missing extensions! Two issues I spotted in the diff:
- Wrong upper bound for Extension H
The added range (c >= 0x2EBF0 && c <= 0x2F7FF) overshoots. CJK Extension H (Unicode 15.0) ends at 0x2EE5F, not 0x2F7FF. The range 0x2EE60–0x2F7FF contains unassigned blocks and bleeds into CJK Compatibility territory. It should be:
There was a problem hiding this comment.
(c >= 0x2EBF0 && c <= 0x2EE5F) // CJK Extension H
abnow-dev
reviewed
Apr 1, 2026
| }) | ||
|
|
||
| test('treats CJK Extension H and I ideographs as CJK break units', () => { | ||
| const extH = String.fromCodePoint(0x31350) // CJK Extension H (Unicode 15.0) |
There was a problem hiding this comment.
const extH = String.fromCodePoint(0x31350) // comment says "Extension H" — but 0x31350 is Extension I
const extI = String.fromCodePoint(0x2EBF0) // comment says "Extension I" — but 0x2EBF0 is Extension H
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two small, targeted fixes:
Add missing CJK Extension H and I ranges to
isCJK()CJK Unified Ideographs Extension H (U+31350–U+323AF, Unicode 15.0) and Extension I (U+2EBF0–U+2F7FF, Unicode 15.1) are not covered by the current range checks. Characters from these blocks fall through without CJK per-character line breaking or kinsoku attachment rules.
This directly addresses the documented requirement in CLAUDE.md:
The new ranges are placed adjacent to their neighboring extensions (I after F, H after G) to keep the block ordering clear.
Remove no-op
.replace()onnormalizeWhitespacePreWrapfast pathWhen the guard
!/[\r\f]/.test(text)is true, no\rexists in the text, so the subsequent.replace(/\r\n/g, '\n')can never match — it always returns the input unchanged. The fast path now returnstextdirectly.Changes
src/analysis.ts: Add Extension H/I ranges toisCJK(); simplify pre-wrap fast pathsrc/layout.test.ts: Add Extension H/I ranges toisWideCharacter()test helper; add test case verifying Extension H/I characters get CJK break behaviorTest plan
bun test— 61 tests pass (including new Extension H/I test)tsc --noEmit— no type errorstruefromisCJK()\r→ no\r\n→ replace was always a no-op)