Skip to content

Releases: JSv4/Docxodus

v5.5.3

21 Mar 02:50
d5c0684

Choose a tag to compare

Bug Fixes

  • Fix text clipping and paragraph spacing in paginated rendering (#117, #114)

    • Fixed lineRule default handling per OOXML spec — when w:lineRule is absent but w:line is present, treat as "auto" (previously the line value was ignored)
    • Implemented contextual spacing: suppress inter-paragraph spacing for consecutive same-style paragraphs, matching Word's behavior
    • Fixed bottom margin over-reservation in pagination logic that caused premature page breaks and clipped text at page bottoms
  • Fix TypeScript subpath exports for moduleResolution: "node" (#116, #113)

    • Reordered export conditions to place types before import in all export entries
    • Added typesVersions fallback so docxodus/react and docxodus/worker subpath imports resolve correctly under all TypeScript module resolution modes

Maintenance

  • Add explicit permissions to GitHub Actions workflows (#112) — Resolves all 7 CodeQL actions/missing-workflow-permissions security alerts by adding permissions: contents: read to workflow jobs

v5.5.2

21 Mar 00:24
ecc8771

Choose a tag to compare

Bug Fixes

  • Fix annotation projection on sanitized HTML fragments (#111, #110) — ProjectAnnotationsOntoHtml(), AddAnnotationToHtml(), and RemoveAnnotationFromHtml() now handle HTML with multiple root elements and HTML named entities (e.g.,  , –), which are common in sanitized output from libraries like DOMPurify.
    • Added ParseHtmlString() / SerializeHtmlString() helpers that replace HTML named entities with numeric XML equivalents and wrap multi-root HTML in a synthetic element for XML parsing
    • Maintains full backward compatibility with single-root HTML documents

v5.5.1

21 Mar 00:04
034e3ff

Choose a tag to compare

What's Changed

  • Fix table container missing top margin when preceded by paragraphs (#108, #109)

Full Changelog: v5.5.0...v5.5.1

v5.5.0 - Incremental Annotation Overlay API

16 Mar 04:00
a402c74

Choose a tag to compare

What's New

Incremental Annotation Overlay API (Issue #106)

Decouples annotation projection from DOCX-to-HTML conversion for dramatically better performance when annotations change. Convert once, then manipulate annotations directly on the cached HTML.

Performance (measured via Playwright on CI)

Operation Time vs Full Re-conversion
Full DOCX re-conversion 892ms baseline
Project all annotations 56ms 15.9x faster
Add single annotation 0.3ms 2,972x faster
Remove single annotation 18ms 49x faster

New Functions

.NET (ExternalAnnotationProjector):

  • ProjectAnnotationsOntoHtml() - Project a full annotation set onto pre-converted HTML
  • AddAnnotationToHtml() - Add a single annotation to existing HTML
  • RemoveAnnotationFromHtml() - Remove a single annotation by ID, unwrapping spans to plain text
  • GenerateVisibilityCss() - Generate CSS to hide/show annotations by label (instant toggling)
  • GenerateAnnotationCssString() - Generate annotation CSS separately

TypeScript/JavaScript:

import {
  convertDocxToHtml,
  projectAnnotationsOntoHtml,
  addAnnotationToHtml,
  removeAnnotationFromHtml,
  generateAnnotationVisibilityCss,
} from "docxodus";

// Step 1: Convert once (~892ms)
const baseHtml = await convertDocxToHtml(docxFile);

// Step 2: Project annotations onto cached HTML (~56ms)
const annotatedHtml = await projectAnnotationsOntoHtml(baseHtml, annotationSet);

// Step 3: Incrementally add/remove (~0.3ms / ~18ms)
const updated = await addAnnotationToHtml(annotatedHtml, newAnnotation, label);
const afterRemove = await removeAnnotationFromHtml(updated, "ann-001");

// Step 4: Toggle labels via CSS (instant, no re-render)
const css = await generateAnnotationVisibilityCss(["DRAFT", "INTERNAL"]);

Bug Fixes

  • Fixed annotation projection offset drift when label text shifts the text map, causing subsequent annotations to wrap the wrong text range

Documentation

  • New architecture doc: docs/architecture/incremental_annotation_overlay.md
  • Updated docs/npm-package.md with full API reference for all 5 functions
  • Updated CLAUDE.md with ExternalAnnotationProjector module description

Full Changelog: v5.4.2...v5.5.0

v5.4.2 - ComparisonLog Infrastructure

26 Jan 17:13
0a0b8c6

Choose a tag to compare

What's New

ComparisonLog Infrastructure for Graceful Error Handling

This release adds optional logging infrastructure to the document comparison engine, allowing comparisons to continue past recoverable document issues while providing visibility into what was handled.

Features

  • New ComparisonLog class - Collects warnings and errors during comparison
  • Graceful handling of malformed documents - Orphaned footnote/endnote references are now logged and removed instead of causing comparison failures
  • Detailed log entries - Each entry includes severity level, machine-readable code, human-readable message, and document location

API Additions

.NET:

var log = new ComparisonLog();
var settings = new WmlComparerSettings { Log = log };
var result = WmlComparer.Compare(doc1, doc2, settings);

if (log.HasWarnings)
{
    foreach (var warning in log.Warnings)
        Console.WriteLine($"{warning.Code}: {warning.Message}");
}

TypeScript/JavaScript:

const result = await compareDocumentsWithLog(original, modified, {
  authorName: "Reviewer"
});

if (result.success && result.hasWarnings) {
  console.log("Warnings:", result.log);
}

New Functions

  • compareDocumentsWithLog() - Returns document bytes + log entries
  • compareDocumentsToHtmlWithLog() - Returns HTML + log entries

Bug Fixes

  • Fixed CompareDocumentsToHtmlFull WASM binding to properly pass detailThreshold and caseInsensitive options (previously ignored)

Log Entry Codes

  • ORPHANED_FOOTNOTE_REFERENCE - Footnote reference with no corresponding definition
  • ORPHANED_ENDNOTE_REFERENCE - Endnote reference with no corresponding definition

Full Changelog: v5.4.1...v5.4.2

v5.4.1: Move markup Word compatibility fix (Issue #96)

21 Jan 13:39
0fa5c85

Choose a tag to compare

Summary

This patch release fixes a bug where documents with move operations would cause Microsoft Word to display "unreadable content" warnings (Issue #96).

Fixed

Move markup Word compatibility (Issue #96)

  • Root cause: FixUpRevMarkIds() was overwriting IDs of w:del/w:ins after FixUpRevisionIds() had assigned unique IDs, causing collisions with move element IDs
  • Fix: Removed redundant FixUpRevMarkIds() call - FixUpRevisionIds() already handles all revision element IDs correctly
  • Added SimplifyMoveMarkup setting to convert move markup to simple w:del/w:ins if desired
  • DetectMoves now defaults to true (move detection is safe to use)
  • Added comprehensive ID uniqueness stress tests to prevent regression

Dependencies

  • Bump DocumentFormat.OpenXml from 3.3.0 to 3.4.1

Full Changelog

v5.4.0...v5.4.1

v5.4.0

24 Dec 12:56
8942fc3

Choose a tag to compare

What's New

This release focuses on improving HTML output fidelity to match Microsoft Word and other rendering engines, with significant fixes for list numbering and footnotes.

Legal Numbering Continuation Pattern (PR #93)

Fixed incorrect multi-level list numbering when items continue a flat sequence at different indentation levels - a common pattern in legal documents.

Before: Items like 1., 2., 3. at level 0 followed by an item at level 1 (with start=4) would incorrectly render as "3.4"
After: Now correctly renders as "4." matching Word behavior

  • Added "continuation pattern" detection in ListItemRetriever.cs
  • When detected, uses level 0's format string with the current counter value
  • Fixes underline and other formatting from being incorrectly applied
  • Fixes tab/indentation spacing to match the effective level

Footnote/Endnote Sequential Numbering (PR #94)

Fixed footnotes and endnotes displaying raw XML IDs instead of sequential display numbers.

Before: Document with 91 footnotes displayed as 2-92 (raw XML IDs)
After: Now correctly displays as 1-91 (sequential based on document order)

  • Per ECMA-376, w:id is a reference identifier, not the display number
  • Added FootnoteNumberingTracker class to build ID → display number mapping
  • Updated both regular and paginated rendering modes

HTML Output Improvements (PRs #89, #90, #91, #92)

Multiple improvements to align with other rendering engines:

  • Footnote/endnote rendering - Improved anchor structure and backref placement
  • Whitespace handling - Normalized inline whitespace between elements to prevent spurious spaces
  • Line height - Removed default 108% line-height that was causing layout differences
  • Empty span prevention - Skip footnoteRef/endnoteRef runs that create empty <span> elements

Test Coverage (PR #88)

Added feature verification tests for resolved converter gaps.


Full Changelog: v5.3.0...v5.4.0

v5.3.0 - HTML Converter Enhancements

21 Dec 14:51
9ef9664

Choose a tag to compare

What's New

WmlToHtmlConverter Enhancements

This release brings significant improvements to the HTML converter, addressing 8 items from the converter gaps document.

Theme Color Resolution (PR #87)

  • New ResolveThemeColors setting (default: true) enables theme color resolution
  • Reads color scheme from theme1.xml (a:clrScheme element)
  • Supports all 12 theme colors: dk1, lt1, dk2, lt2, accent1-6, hlink, folHlink
  • Applies w:themeTint (lighten) and w:themeShade (darken) modifiers
  • Falls back to explicit color value if theme color not found

@page CSS Rule Generation (PR #87)

  • New GeneratePageCss setting (default: false) enables @page rule generation
  • Reads page dimensions from w:sectPr/w:pgSz and margins from w:sectPr/w:pgMar
  • Generates CSS @page { size: Xin Yin; margin: ... } rules
  • Useful for print stylesheets and PDF generation

Font Fallback Improvements (PR #86)

  • Unknown fonts are classified by name patterns and get proper generic fallback:
    • Fonts with "sans" → font-family: 'FontName', sans-serif
    • Fonts with "mono", "code", "courier" → font-family: 'FontName', monospace
    • Other fonts default to serif fallback
  • CJK text gets language-specific font fallback chains:
    • Japanese: 'Noto Serif CJK JP', 'Yu Mincho', 'MS Mincho', ...
    • Simplified Chinese: 'Noto Serif CJK SC', 'Microsoft YaHei', 'SimSun', ...
    • Traditional Chinese: 'Noto Serif CJK TC', 'Microsoft JhengHei', 'PMingLiU', ...
    • Korean: 'Noto Serif CJK KR', 'Malgun Gothic', 'Batang', ...

Document Language Support (PR #85)

  • <html> element now includes lang attribute (e.g., <html lang="en-US">)
  • Language auto-detected from document settings or default paragraph style
  • New DocumentLanguage setting for manual override
  • Foreign text spans get appropriate lang attribute when different from document default

Unsupported Content Placeholders (PR #84)

  • New RenderUnsupportedContentPlaceholders setting for visual indicators
  • Supports WMF/EMF images, SVG, Math equations, form fields, and Ruby annotations
  • Placeholders are styled with CSS and include semantic data attributes

Tab Leader Character Support (PR #83)

  • Tab stops with leader characters now render correctly
  • Supports dot, hyphen, underscore, and middle dot leaders

Bug Fixes

  • Thread-safety for static caches (PR #82) - Fixed potential corruption during concurrent conversions by using ConcurrentDictionary for ShadeCache and font tracking
  • Null rPr handling (PR #81) - Fixed crash in DefineRunStyle and GetLangAttribute when converting runs without explicit run properties

CI Updates

  • Updated actions/upload-artifact from v5 to v6
  • Updated actions/download-artifact from v5 to v7

Full Changelog: v5.2.0...v5.3.0

v5.2.0

06 Dec 06:10
8058c20

Choose a tag to compare

What's New

Fixed

  • WmlComparer legal numbering preservation (Issue #1634) - Fixed comparison losing legal numbering (w:isLgl) when comparing documents with different numbering styles. The comparer now properly merges numbering definitions from the revised document into the result:
    • Copies abstractNum and num elements from revised document when missing in original
    • Reuses existing definitions when content matches (regardless of ID)
    • Remaps IDs when conflicts occur to avoid duplicates
    • Null-safe attribute extraction for robustness with malformed documents

Documentation

  • Updated all READMEs to reflect current feature set
  • Fixed repository URLs throughout (JSv4/Docxodus)
  • Simplified NuGet installation instructions
  • Added documentation for environment variables (REDLINE_DEBUG, DOCX2HTML_DEBUG)
  • Updated npm package documentation with new APIs (metadata, format changes, Web Workers)

Full Changelog: v5.1.2...v5.2.0

v5.1.0 - OpenContracts Export Format

04 Dec 03:52
8b49651

Choose a tag to compare

New Features

OpenContracts Export Format

Export DOCX documents to the OpenContracts format for interoperability with the OpenContracts ecosystem for document analysis.

API:

  • C#: OpenContractExporter.Export(WmlDocument) / OpenContractExporter.Export(WordprocessingDocument)
  • WASM: DocumentConverter.ExportToOpenContract()
  • TypeScript: exportToOpenContract() with full type definitions

Export includes:

  • Complete text extraction (paragraphs, tables, headers, footers, footnotes, endnotes)
  • PAWLS-format page layout with token positions
  • Structural annotations (SECTION, PARAGRAPH, TABLE)
  • Parent-child relationships between annotations

New CLI Tool: docx2oc

Command-line tool for exporting DOCX files to OpenContracts JSON format:

# Export with default output (contract.oc)
docx2oc contract.docx

# Export with custom output filename
docx2oc contract.docx export.json

Install as .NET tool:

dotnet tool install --global Docx2OC

Types Added

  • OpenContractDocExport, PawlsPage, PawlsToken
  • OpenContractsAnnotation, OpenContractsRelationship
  • TextSpan, BoundingBox, TokenId

Full Changelog

See CHANGELOG.md for complete details.