Releases · JSv4/Docxodus

Bug Fixes

Fix text clipping and paragraph spacing in paginated rendering (#117, #114)
- Fixed lineRule default handling per OOXML spec — when w:lineRule is absent but w:line is present, treat as "auto" (previously the line value was ignored)
- Implemented contextual spacing: suppress inter-paragraph spacing for consecutive same-style paragraphs, matching Word's behavior
- Fixed bottom margin over-reservation in pagination logic that caused premature page breaks and clipped text at page bottoms
Fix TypeScript subpath exports for moduleResolution: "node" (#116, #113)
- Reordered export conditions to place types before import in all export entries
- Added typesVersions fallback so docxodus/react and docxodus/worker subpath imports resolve correctly under all TypeScript module resolution modes

Maintenance

Add explicit permissions to GitHub Actions workflows (#112) — Resolves all 7 CodeQL actions/missing-workflow-permissions security alerts by adding permissions: contents: read to workflow jobs

Bug Fixes

Fix annotation projection on sanitized HTML fragments (#111, #110) — ProjectAnnotationsOntoHtml(), AddAnnotationToHtml(), and RemoveAnnotationFromHtml() now handle HTML with multiple root elements and HTML named entities (e.g.,  , –), which are common in sanitized output from libraries like DOMPurify.
- Added ParseHtmlString() / SerializeHtmlString() helpers that replace HTML named entities with numeric XML equivalents and wrap multi-root HTML in a synthetic element for XML parsing
- Maintains full backward compatibility with single-root HTML documents

What's Changed

Fix table container missing top margin when preceded by paragraphs (#108, #109)

Full Changelog: v5.5.0...v5.5.1

What's New

Incremental Annotation Overlay API (Issue #106)

Decouples annotation projection from DOCX-to-HTML conversion for dramatically better performance when annotations change. Convert once, then manipulate annotations directly on the cached HTML.

Performance (measured via Playwright on CI)

Operation	Time	vs Full Re-conversion
Full DOCX re-conversion	892ms	baseline
Project all annotations	56ms	15.9x faster
Add single annotation	0.3ms	2,972x faster
Remove single annotation	18ms	49x faster

New Functions

.NET (ExternalAnnotationProjector):

ProjectAnnotationsOntoHtml() - Project a full annotation set onto pre-converted HTML
AddAnnotationToHtml() - Add a single annotation to existing HTML
RemoveAnnotationFromHtml() - Remove a single annotation by ID, unwrapping spans to plain text
GenerateVisibilityCss() - Generate CSS to hide/show annotations by label (instant toggling)
GenerateAnnotationCssString() - Generate annotation CSS separately

TypeScript/JavaScript:

import {
  convertDocxToHtml,
  projectAnnotationsOntoHtml,
  addAnnotationToHtml,
  removeAnnotationFromHtml,
  generateAnnotationVisibilityCss,
} from "docxodus";

// Step 1: Convert once (~892ms)
const baseHtml = await convertDocxToHtml(docxFile);

// Step 2: Project annotations onto cached HTML (~56ms)
const annotatedHtml = await projectAnnotationsOntoHtml(baseHtml, annotationSet);

// Step 3: Incrementally add/remove (~0.3ms / ~18ms)
const updated = await addAnnotationToHtml(annotatedHtml, newAnnotation, label);
const afterRemove = await removeAnnotationFromHtml(updated, "ann-001");

// Step 4: Toggle labels via CSS (instant, no re-render)
const css = await generateAnnotationVisibilityCss(["DRAFT", "INTERNAL"]);

Bug Fixes

Fixed annotation projection offset drift when label text shifts the text map, causing subsequent annotations to wrap the wrong text range

Documentation

New architecture doc: docs/architecture/incremental_annotation_overlay.md
Updated docs/npm-package.md with full API reference for all 5 functions
Updated CLAUDE.md with ExternalAnnotationProjector module description

Full Changelog: v5.4.2...v5.5.0

What's New

ComparisonLog Infrastructure for Graceful Error Handling

This release adds optional logging infrastructure to the document comparison engine, allowing comparisons to continue past recoverable document issues while providing visibility into what was handled.

Features

New ComparisonLog class - Collects warnings and errors during comparison
Graceful handling of malformed documents - Orphaned footnote/endnote references are now logged and removed instead of causing comparison failures
Detailed log entries - Each entry includes severity level, machine-readable code, human-readable message, and document location

API Additions

.NET:

var log = new ComparisonLog();
var settings = new WmlComparerSettings { Log = log };
var result = WmlComparer.Compare(doc1, doc2, settings);

if (log.HasWarnings)
{
    foreach (var warning in log.Warnings)
        Console.WriteLine($"{warning.Code}: {warning.Message}");
}

TypeScript/JavaScript:

const result = await compareDocumentsWithLog(original, modified, {
  authorName: "Reviewer"
});

if (result.success && result.hasWarnings) {
  console.log("Warnings:", result.log);
}

New Functions

compareDocumentsWithLog() - Returns document bytes + log entries
compareDocumentsToHtmlWithLog() - Returns HTML + log entries

Bug Fixes

Fixed CompareDocumentsToHtmlFull WASM binding to properly pass detailThreshold and caseInsensitive options (previously ignored)

Log Entry Codes

ORPHANED_FOOTNOTE_REFERENCE - Footnote reference with no corresponding definition
ORPHANED_ENDNOTE_REFERENCE - Endnote reference with no corresponding definition

Full Changelog: v5.4.1...v5.4.2

Summary

This patch release fixes a bug where documents with move operations would cause Microsoft Word to display "unreadable content" warnings (Issue #96).

Fixed

Move markup Word compatibility (Issue #96)

Root cause: FixUpRevMarkIds() was overwriting IDs of w:del/w:ins after FixUpRevisionIds() had assigned unique IDs, causing collisions with move element IDs
Fix: Removed redundant FixUpRevMarkIds() call - FixUpRevisionIds() already handles all revision element IDs correctly
Added SimplifyMoveMarkup setting to convert move markup to simple w:del/w:ins if desired
DetectMoves now defaults to true (move detection is safe to use)
Added comprehensive ID uniqueness stress tests to prevent regression

Dependencies

Bump DocumentFormat.OpenXml from 3.3.0 to 3.4.1

Full Changelog

v5.4.0...v5.4.1

What's New

This release focuses on improving HTML output fidelity to match Microsoft Word and other rendering engines, with significant fixes for list numbering and footnotes.

Legal Numbering Continuation Pattern (PR #93)

Fixed incorrect multi-level list numbering when items continue a flat sequence at different indentation levels - a common pattern in legal documents.

Before: Items like 1., 2., 3. at level 0 followed by an item at level 1 (with start=4) would incorrectly render as "3.4"
After: Now correctly renders as "4." matching Word behavior

Added "continuation pattern" detection in ListItemRetriever.cs
When detected, uses level 0's format string with the current counter value
Fixes underline and other formatting from being incorrectly applied
Fixes tab/indentation spacing to match the effective level

Footnote/Endnote Sequential Numbering (PR #94)

Fixed footnotes and endnotes displaying raw XML IDs instead of sequential display numbers.

Before: Document with 91 footnotes displayed as 2-92 (raw XML IDs)
After: Now correctly displays as 1-91 (sequential based on document order)

Per ECMA-376, w:id is a reference identifier, not the display number
Added FootnoteNumberingTracker class to build ID → display number mapping
Updated both regular and paginated rendering modes

HTML Output Improvements (PRs #89, #90, #91, #92)

Multiple improvements to align with other rendering engines:

Footnote/endnote rendering - Improved anchor structure and backref placement
Whitespace handling - Normalized inline whitespace between elements to prevent spurious spaces
Line height - Removed default 108% line-height that was causing layout differences
Empty span prevention - Skip footnoteRef/endnoteRef runs that create empty <span> elements

Test Coverage (PR #88)

Added feature verification tests for resolved converter gaps.

Full Changelog: v5.3.0...v5.4.0

@page

What's New

WmlToHtmlConverter Enhancements

This release brings significant improvements to the HTML converter, addressing 8 items from the converter gaps document.

Theme Color Resolution (PR #87)

New ResolveThemeColors setting (default: true) enables theme color resolution
Reads color scheme from theme1.xml (a:clrScheme element)
Supports all 12 theme colors: dk1, lt1, dk2, lt2, accent1-6, hlink, folHlink
Applies w:themeTint (lighten) and w:themeShade (darken) modifiers
Falls back to explicit color value if theme color not found

@page CSS Rule Generation (PR #87)

New GeneratePageCss setting (default: false) enables @page rule generation
Reads page dimensions from w:sectPr/w:pgSz and margins from w:sectPr/w:pgMar
Generates CSS @page { size: Xin Yin; margin: ... } rules
Useful for print stylesheets and PDF generation

Font Fallback Improvements (PR #86)

Unknown fonts are classified by name patterns and get proper generic fallback:
- Fonts with "sans" → font-family: 'FontName', sans-serif
- Fonts with "mono", "code", "courier" → font-family: 'FontName', monospace
- Other fonts default to serif fallback
CJK text gets language-specific font fallback chains:
- Japanese: 'Noto Serif CJK JP', 'Yu Mincho', 'MS Mincho', ...
- Simplified Chinese: 'Noto Serif CJK SC', 'Microsoft YaHei', 'SimSun', ...
- Traditional Chinese: 'Noto Serif CJK TC', 'Microsoft JhengHei', 'PMingLiU', ...
- Korean: 'Noto Serif CJK KR', 'Malgun Gothic', 'Batang', ...

Document Language Support (PR #85)

<html> element now includes lang attribute (e.g., <html lang="en-US">)
Language auto-detected from document settings or default paragraph style
New DocumentLanguage setting for manual override
Foreign text spans get appropriate lang attribute when different from document default

Unsupported Content Placeholders (PR #84)

New RenderUnsupportedContentPlaceholders setting for visual indicators
Supports WMF/EMF images, SVG, Math equations, form fields, and Ruby annotations
Placeholders are styled with CSS and include semantic data attributes

Tab Leader Character Support (PR #83)

Tab stops with leader characters now render correctly
Supports dot, hyphen, underscore, and middle dot leaders

Bug Fixes

Thread-safety for static caches (PR #82) - Fixed potential corruption during concurrent conversions by using ConcurrentDictionary for ShadeCache and font tracking
Null rPr handling (PR #81) - Fixed crash in DefineRunStyle and GetLangAttribute when converting runs without explicit run properties

CI Updates

Updated actions/upload-artifact from v5 to v6
Updated actions/download-artifact from v5 to v7

Full Changelog: v5.2.0...v5.3.0

What's New

Fixed

WmlComparer legal numbering preservation (Issue #1634) - Fixed comparison losing legal numbering (w:isLgl) when comparing documents with different numbering styles. The comparer now properly merges numbering definitions from the revised document into the result:
- Copies abstractNum and num elements from revised document when missing in original
- Reuses existing definitions when content matches (regardless of ID)
- Remaps IDs when conflicts occur to avoid duplicates
- Null-safe attribute extraction for robustness with malformed documents

Documentation

Updated all READMEs to reflect current feature set
Fixed repository URLs throughout (JSv4/Docxodus)
Simplified NuGet installation instructions
Added documentation for environment variables (REDLINE_DEBUG, DOCX2HTML_DEBUG)
Updated npm package documentation with new APIs (metadata, format changes, Web Workers)

Full Changelog: v5.1.2...v5.2.0

New Features

OpenContracts Export Format

Export DOCX documents to the OpenContracts format for interoperability with the OpenContracts ecosystem for document analysis.

API:

C#: OpenContractExporter.Export(WmlDocument) / OpenContractExporter.Export(WordprocessingDocument)
WASM: DocumentConverter.ExportToOpenContract()
TypeScript: exportToOpenContract() with full type definitions

Export includes:

Complete text extraction (paragraphs, tables, headers, footers, footnotes, endnotes)
PAWLS-format page layout with token positions
Structural annotations (SECTION, PARAGRAPH, TABLE)
Parent-child relationships between annotations

New CLI Tool: `docx2oc`

Command-line tool for exporting DOCX files to OpenContracts JSON format:

# Export with default output (contract.oc)
docx2oc contract.docx

# Export with custom output filename
docx2oc contract.docx export.json

Install as .NET tool:

dotnet tool install --global Docx2OC

Types Added

OpenContractDocExport, PawlsPage, PawlsToken
OpenContractsAnnotation, OpenContractsRelationship
TextSpan, BoundingBox, TokenId

Full Changelog

See CHANGELOG.md for complete details.

Releases: JSv4/Docxodus

v5.5.3

Bug Fixes

Maintenance

Uh oh!

v5.5.2

Bug Fixes

Uh oh!

v5.5.1

What's Changed

Uh oh!

v5.5.0 - Incremental Annotation Overlay API

What's New

Incremental Annotation Overlay API (Issue #106)

Performance (measured via Playwright on CI)

New Functions

Bug Fixes

Documentation

Uh oh!

v5.4.2 - ComparisonLog Infrastructure

What's New

ComparisonLog Infrastructure for Graceful Error Handling

Features

API Additions

New Functions

Bug Fixes

Log Entry Codes

Uh oh!

v5.4.1: Move markup Word compatibility fix (Issue #96)

Summary

Fixed

Move markup Word compatibility (Issue #96)

Dependencies

Full Changelog

Uh oh!

v5.4.0

What's New

Legal Numbering Continuation Pattern (PR #93)

Footnote/Endnote Sequential Numbering (PR #94)

HTML Output Improvements (PRs #89, #90, #91, #92)

Test Coverage (PR #88)

Uh oh!

v5.3.0 - HTML Converter Enhancements

What's New

WmlToHtmlConverter Enhancements

Theme Color Resolution (PR #87)

@page CSS Rule Generation (PR #87)

Font Fallback Improvements (PR #86)

Document Language Support (PR #85)

Unsupported Content Placeholders (PR #84)

Tab Leader Character Support (PR #83)

Bug Fixes

CI Updates

Contributors

Uh oh!

v5.2.0

What's New

Fixed

Documentation

Uh oh!

v5.1.0 - OpenContracts Export Format

New Features

OpenContracts Export Format

New CLI Tool: docx2oc

Types Added

Full Changelog

Uh oh!

New CLI Tool: `docx2oc`