Releases: JSv4/Docxodus
v5.5.3
Bug Fixes
-
Fix text clipping and paragraph spacing in paginated rendering (#117, #114)
- Fixed
lineRuledefault handling per OOXML spec — whenw:lineRuleis absent butw:lineis present, treat as "auto" (previously the line value was ignored) - Implemented contextual spacing: suppress inter-paragraph spacing for consecutive same-style paragraphs, matching Word's behavior
- Fixed bottom margin over-reservation in pagination logic that caused premature page breaks and clipped text at page bottoms
- Fixed
-
Fix TypeScript subpath exports for
moduleResolution: "node"(#116, #113)- Reordered export conditions to place
typesbeforeimportin all export entries - Added
typesVersionsfallback sodocxodus/reactanddocxodus/workersubpath imports resolve correctly under all TypeScript module resolution modes
- Reordered export conditions to place
Maintenance
- Add explicit permissions to GitHub Actions workflows (#112) — Resolves all 7 CodeQL
actions/missing-workflow-permissionssecurity alerts by addingpermissions: contents: readto workflow jobs
v5.5.2
Bug Fixes
- Fix annotation projection on sanitized HTML fragments (#111, #110) —
ProjectAnnotationsOntoHtml(),AddAnnotationToHtml(), andRemoveAnnotationFromHtml()now handle HTML with multiple root elements and HTML named entities (e.g., ,–), which are common in sanitized output from libraries like DOMPurify.- Added
ParseHtmlString()/SerializeHtmlString()helpers that replace HTML named entities with numeric XML equivalents and wrap multi-root HTML in a synthetic element for XML parsing - Maintains full backward compatibility with single-root HTML documents
- Added
v5.5.1
v5.5.0 - Incremental Annotation Overlay API
What's New
Incremental Annotation Overlay API (Issue #106)
Decouples annotation projection from DOCX-to-HTML conversion for dramatically better performance when annotations change. Convert once, then manipulate annotations directly on the cached HTML.
Performance (measured via Playwright on CI)
| Operation | Time | vs Full Re-conversion |
|---|---|---|
| Full DOCX re-conversion | 892ms | baseline |
| Project all annotations | 56ms | 15.9x faster |
| Add single annotation | 0.3ms | 2,972x faster |
| Remove single annotation | 18ms | 49x faster |
New Functions
.NET (ExternalAnnotationProjector):
ProjectAnnotationsOntoHtml()- Project a full annotation set onto pre-converted HTMLAddAnnotationToHtml()- Add a single annotation to existing HTMLRemoveAnnotationFromHtml()- Remove a single annotation by ID, unwrapping spans to plain textGenerateVisibilityCss()- Generate CSS to hide/show annotations by label (instant toggling)GenerateAnnotationCssString()- Generate annotation CSS separately
TypeScript/JavaScript:
import {
convertDocxToHtml,
projectAnnotationsOntoHtml,
addAnnotationToHtml,
removeAnnotationFromHtml,
generateAnnotationVisibilityCss,
} from "docxodus";
// Step 1: Convert once (~892ms)
const baseHtml = await convertDocxToHtml(docxFile);
// Step 2: Project annotations onto cached HTML (~56ms)
const annotatedHtml = await projectAnnotationsOntoHtml(baseHtml, annotationSet);
// Step 3: Incrementally add/remove (~0.3ms / ~18ms)
const updated = await addAnnotationToHtml(annotatedHtml, newAnnotation, label);
const afterRemove = await removeAnnotationFromHtml(updated, "ann-001");
// Step 4: Toggle labels via CSS (instant, no re-render)
const css = await generateAnnotationVisibilityCss(["DRAFT", "INTERNAL"]);Bug Fixes
- Fixed annotation projection offset drift when label text shifts the text map, causing subsequent annotations to wrap the wrong text range
Documentation
- New architecture doc:
docs/architecture/incremental_annotation_overlay.md - Updated
docs/npm-package.mdwith full API reference for all 5 functions - Updated
CLAUDE.mdwithExternalAnnotationProjectormodule description
Full Changelog: v5.4.2...v5.5.0
v5.4.2 - ComparisonLog Infrastructure
What's New
ComparisonLog Infrastructure for Graceful Error Handling
This release adds optional logging infrastructure to the document comparison engine, allowing comparisons to continue past recoverable document issues while providing visibility into what was handled.
Features
- New
ComparisonLogclass - Collects warnings and errors during comparison - Graceful handling of malformed documents - Orphaned footnote/endnote references are now logged and removed instead of causing comparison failures
- Detailed log entries - Each entry includes severity level, machine-readable code, human-readable message, and document location
API Additions
.NET:
var log = new ComparisonLog();
var settings = new WmlComparerSettings { Log = log };
var result = WmlComparer.Compare(doc1, doc2, settings);
if (log.HasWarnings)
{
foreach (var warning in log.Warnings)
Console.WriteLine($"{warning.Code}: {warning.Message}");
}TypeScript/JavaScript:
const result = await compareDocumentsWithLog(original, modified, {
authorName: "Reviewer"
});
if (result.success && result.hasWarnings) {
console.log("Warnings:", result.log);
}New Functions
compareDocumentsWithLog()- Returns document bytes + log entriescompareDocumentsToHtmlWithLog()- Returns HTML + log entries
Bug Fixes
- Fixed
CompareDocumentsToHtmlFullWASM binding to properly passdetailThresholdandcaseInsensitiveoptions (previously ignored)
Log Entry Codes
ORPHANED_FOOTNOTE_REFERENCE- Footnote reference with no corresponding definitionORPHANED_ENDNOTE_REFERENCE- Endnote reference with no corresponding definition
Full Changelog: v5.4.1...v5.4.2
v5.4.1: Move markup Word compatibility fix (Issue #96)
Summary
This patch release fixes a bug where documents with move operations would cause Microsoft Word to display "unreadable content" warnings (Issue #96).
Fixed
Move markup Word compatibility (Issue #96)
- Root cause:
FixUpRevMarkIds()was overwriting IDs ofw:del/w:insafterFixUpRevisionIds()had assigned unique IDs, causing collisions with move element IDs - Fix: Removed redundant
FixUpRevMarkIds()call -FixUpRevisionIds()already handles all revision element IDs correctly - Added
SimplifyMoveMarkupsetting to convert move markup to simplew:del/w:insif desired DetectMovesnow defaults totrue(move detection is safe to use)- Added comprehensive ID uniqueness stress tests to prevent regression
Dependencies
- Bump DocumentFormat.OpenXml from 3.3.0 to 3.4.1
Full Changelog
v5.4.0
What's New
This release focuses on improving HTML output fidelity to match Microsoft Word and other rendering engines, with significant fixes for list numbering and footnotes.
Legal Numbering Continuation Pattern (PR #93)
Fixed incorrect multi-level list numbering when items continue a flat sequence at different indentation levels - a common pattern in legal documents.
Before: Items like 1., 2., 3. at level 0 followed by an item at level 1 (with start=4) would incorrectly render as "3.4"
After: Now correctly renders as "4." matching Word behavior
- Added "continuation pattern" detection in
ListItemRetriever.cs - When detected, uses level 0's format string with the current counter value
- Fixes underline and other formatting from being incorrectly applied
- Fixes tab/indentation spacing to match the effective level
Footnote/Endnote Sequential Numbering (PR #94)
Fixed footnotes and endnotes displaying raw XML IDs instead of sequential display numbers.
Before: Document with 91 footnotes displayed as 2-92 (raw XML IDs)
After: Now correctly displays as 1-91 (sequential based on document order)
- Per ECMA-376,
w:idis a reference identifier, not the display number - Added
FootnoteNumberingTrackerclass to build ID → display number mapping - Updated both regular and paginated rendering modes
HTML Output Improvements (PRs #89, #90, #91, #92)
Multiple improvements to align with other rendering engines:
- Footnote/endnote rendering - Improved anchor structure and backref placement
- Whitespace handling - Normalized inline whitespace between elements to prevent spurious spaces
- Line height - Removed default 108% line-height that was causing layout differences
- Empty span prevention - Skip footnoteRef/endnoteRef runs that create empty
<span>elements
Test Coverage (PR #88)
Added feature verification tests for resolved converter gaps.
Full Changelog: v5.3.0...v5.4.0
v5.3.0 - HTML Converter Enhancements
What's New
WmlToHtmlConverter Enhancements
This release brings significant improvements to the HTML converter, addressing 8 items from the converter gaps document.
Theme Color Resolution (PR #87)
- New
ResolveThemeColorssetting (default: true) enables theme color resolution - Reads color scheme from
theme1.xml(a:clrSchemeelement) - Supports all 12 theme colors: dk1, lt1, dk2, lt2, accent1-6, hlink, folHlink
- Applies
w:themeTint(lighten) andw:themeShade(darken) modifiers - Falls back to explicit color value if theme color not found
@page CSS Rule Generation (PR #87)
- New
GeneratePageCsssetting (default: false) enables@pagerule generation - Reads page dimensions from
w:sectPr/w:pgSzand margins fromw:sectPr/w:pgMar - Generates CSS
@page { size: Xin Yin; margin: ... }rules - Useful for print stylesheets and PDF generation
Font Fallback Improvements (PR #86)
- Unknown fonts are classified by name patterns and get proper generic fallback:
- Fonts with "sans" →
font-family: 'FontName', sans-serif - Fonts with "mono", "code", "courier" →
font-family: 'FontName', monospace - Other fonts default to serif fallback
- Fonts with "sans" →
- CJK text gets language-specific font fallback chains:
- Japanese:
'Noto Serif CJK JP', 'Yu Mincho', 'MS Mincho', ... - Simplified Chinese:
'Noto Serif CJK SC', 'Microsoft YaHei', 'SimSun', ... - Traditional Chinese:
'Noto Serif CJK TC', 'Microsoft JhengHei', 'PMingLiU', ... - Korean:
'Noto Serif CJK KR', 'Malgun Gothic', 'Batang', ...
- Japanese:
Document Language Support (PR #85)
<html>element now includeslangattribute (e.g.,<html lang="en-US">)- Language auto-detected from document settings or default paragraph style
- New
DocumentLanguagesetting for manual override - Foreign text spans get appropriate
langattribute when different from document default
Unsupported Content Placeholders (PR #84)
- New
RenderUnsupportedContentPlaceholderssetting for visual indicators - Supports WMF/EMF images, SVG, Math equations, form fields, and Ruby annotations
- Placeholders are styled with CSS and include semantic data attributes
Tab Leader Character Support (PR #83)
- Tab stops with leader characters now render correctly
- Supports dot, hyphen, underscore, and middle dot leaders
Bug Fixes
- Thread-safety for static caches (PR #82) - Fixed potential corruption during concurrent conversions by using
ConcurrentDictionaryforShadeCacheand font tracking - Null rPr handling (PR #81) - Fixed crash in
DefineRunStyleandGetLangAttributewhen converting runs without explicit run properties
CI Updates
- Updated
actions/upload-artifactfrom v5 to v6 - Updated
actions/download-artifactfrom v5 to v7
Full Changelog: v5.2.0...v5.3.0
v5.2.0
What's New
Fixed
- WmlComparer legal numbering preservation (Issue #1634) - Fixed comparison losing legal numbering (
w:isLgl) when comparing documents with different numbering styles. The comparer now properly merges numbering definitions from the revised document into the result:- Copies
abstractNumandnumelements from revised document when missing in original - Reuses existing definitions when content matches (regardless of ID)
- Remaps IDs when conflicts occur to avoid duplicates
- Null-safe attribute extraction for robustness with malformed documents
- Copies
Documentation
- Updated all READMEs to reflect current feature set
- Fixed repository URLs throughout (JSv4/Docxodus)
- Simplified NuGet installation instructions
- Added documentation for environment variables (REDLINE_DEBUG, DOCX2HTML_DEBUG)
- Updated npm package documentation with new APIs (metadata, format changes, Web Workers)
Full Changelog: v5.1.2...v5.2.0
v5.1.0 - OpenContracts Export Format
New Features
OpenContracts Export Format
Export DOCX documents to the OpenContracts format for interoperability with the OpenContracts ecosystem for document analysis.
API:
- C#:
OpenContractExporter.Export(WmlDocument)/OpenContractExporter.Export(WordprocessingDocument) - WASM:
DocumentConverter.ExportToOpenContract() - TypeScript:
exportToOpenContract()with full type definitions
Export includes:
- Complete text extraction (paragraphs, tables, headers, footers, footnotes, endnotes)
- PAWLS-format page layout with token positions
- Structural annotations (SECTION, PARAGRAPH, TABLE)
- Parent-child relationships between annotations
New CLI Tool: docx2oc
Command-line tool for exporting DOCX files to OpenContracts JSON format:
# Export with default output (contract.oc)
docx2oc contract.docx
# Export with custom output filename
docx2oc contract.docx export.jsonInstall as .NET tool:
dotnet tool install --global Docx2OCTypes Added
OpenContractDocExport,PawlsPage,PawlsTokenOpenContractsAnnotation,OpenContractsRelationshipTextSpan,BoundingBox,TokenId
Full Changelog
See CHANGELOG.md for complete details.