fix(converter): handle legal numbering continuation pattern#93
Merged
Conversation
Fix incorrect multi-level list numbering when items continue a flat sequence at different indentation levels. Documents with items like 1., 2., 3. at level 0 followed by item at level 1 (with start=4) now render as "4." instead of "3.4". The fix adds "continuation pattern" detection in ListItemRetriever.cs: - Detects when a deeper-level item continues a flat list sequence (start value equals parent counter + 1) - When detected, uses level 0's format string with current counter - Tracks continuation state per level with inheritance - Resets deeper level tracking when going to shallower levels Also adds: - docs/ooxml_corner_cases.md for documenting OOXML edge cases - CLAUDE.md section on documenting corner cases
…ontinuation patterns When a list item is detected as a "continuation pattern" (deeper-level item that continues a flat sequence), the converter now uses level 0's properties for: - Run properties (rPr) - fixes underline appearing incorrectly - Paragraph properties (pPr) - fixes tab stops and indentation Changes: - FormattingAssembler.cs: Use GetEffectiveLevel() in NormalizeListItemsTransform, ParaStyleParaPropsStack, and AnnotateParagraph functions - ListItemRetriever.cs: Made ContinuationInfo class internal for cross-file access - Updated CHANGELOG.md and docs/ooxml_corner_cases.md with fix details
Added two tests to verify the continuation pattern fix: - HC050_ContinuationPattern_RendersCorrectNumber: Verifies items at ilvl=1 with start=4 render as "4." instead of "3.4" - HC051_ContinuationPattern_UsesLevel0Formatting: Verifies continuation items use level 0's run properties (no underline when level 1 has underline) These tests create programmatic documents with multi-level numbered lists to verify the fix handles the edge case correctly.
…tting Clone numberingParaProps before removing non-indent elements to avoid mutating the original numbering definition. This was causing subsequent list items using the same numbering level to lose their tab definitions, resulting in inconsistent wrapper span widths (0.5in instead of 0.25in). Also updated CalculateSpanWidthTransform to check accumulated properties (pt:pPr) first for indentation and tabs, falling back to direct properties (w:pPr) if not found.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The Problem
In the NVCA-Model-COI document, a numbered list had:
ilvl=0with format%1.→ displayed correctly as "1.", "2.", "3."ilvl=1with format%1.%2andstart=4→ was displaying as "3.4" instead of "4."Microsoft Word shows "4." because it detects this as a "continuation pattern" - a flat list that happens to use different indentation levels.
Additionally, items 2-4 had different indentation than item 1 (0.5in vs 0.25in wrapper width) due to a mutation bug.
The Fix
1. Continuation Pattern Detection
Added "continuation pattern" detection in
ListItemRetriever.cs:Detection: A paragraph at
ilvl > 0is in a continuation pattern when:startvalue equals the parent level's counter + 1Handling: When detected, uses level 0's format string with the current level's counter value
2. Numbering Definition Mutation Fix
Fixed mutation bug in
FormattingAssembler.cs:pPrelement by calling.Remove()on its child elementsnumberingParaPropsbefore modifying3. Tab Width Calculation Fix
Updated
WmlToHtmlConverter.csto check accumulated properties (pt:pPr) first for indentation and tabs, falling back to direct properties (w:pPr) if not found.Changes
Docxodus/ListItemRetriever.cs- AddedContinuationInfoclass and detection/handling logicDocxodus/FormattingAssembler.cs- Clone numberingParaProps before mutation; use effective level for propertiesDocxodus/WmlToHtmlConverter.cs- Check pt:pPr first for tabs and indentationdocs/ooxml_corner_cases.md- New documentation for OOXML edge casesCLAUDE.md- Added section on documenting corner casesCHANGELOG.md- Added entry for this fixTest plan