Skip to content

Add docx2html CLI tool for DOCX to HTML conversion#11

Merged
JSv4 merged 5 commits intomainfrom
JSv4/upgrade-docx-rendering
Nov 27, 2025
Merged

Add docx2html CLI tool for DOCX to HTML conversion#11
JSv4 merged 5 commits intomainfrom
JSv4/upgrade-docx-rendering

Conversation

@JSv4
Copy link
Copy Markdown
Owner

@JSv4 JSv4 commented Nov 27, 2025

Summary

  • Add new docx2html CLI tool for converting DOCX files to HTML
  • Clean up unused System.Drawing imports from test files (14 files)
  • Update publish workflow to build and publish docx2html alongside redline
  • Update README with docx2html documentation and binary download links

Features

The docx2html tool supports:

  • Basic DOCX to HTML conversion with CSS styling
  • Custom page titles (--title)
  • CSS class prefix customization (--css-prefix)
  • Inline styles option (--inline-styles)
  • Image extraction to separate files (--extract-images) or base64 embedding (default)

Test plan

  • All 978 tests pass
  • docx2html builds successfully
  • Manual testing of docx2html with sample documents
  • Verify publish workflow on tag push

JSv4 added 5 commits November 26, 2025 17:03
The test files had legacy System.Drawing and System.Drawing.Imaging using
statements that were no longer being used after the SkiaSharp migration.
New standalone command-line tool that converts Word documents to HTML:
- Supports CSS classes or inline styles
- Can embed images as base64 data URIs or extract to files
- Configurable page title and CSS prefix
- Handles tables, formatting, hyperlinks, and bidirectional text

Usage: docx2html input.docx [output.html] [options]
Updates the publish workflow to:
- Build and restore docx2html alongside redline
- Pack docx2html NuGet package
- Build self-contained binaries for all platforms (linux-x64, win-x64, osx-x64, osx-arm64)
- Add docx2html installation and usage instructions
- Add docx2html binary downloads to releases section
- Update 'What's Included' to list both CLI tools
- Remove outdated 'What's Planned' section
@JSv4 JSv4 merged commit 6ae6a58 into main Nov 27, 2025
4 checks passed
@JSv4 JSv4 deleted the JSv4/upgrade-docx-rendering branch November 27, 2025 15:33
JSv4 added a commit that referenced this pull request Dec 20, 2025
- Add DocumentLanguage setting to override document language
- Add lang attribute to <html> element (auto-detected from document)
- Update GetLangAttribute() to use actual document default language
- Language detected from w:themeFontLang, default style, or fallback to "en-US"
- Foreign text spans get lang attribute when different from doc default
- Add WASM/npm support for documentLanguage option
- Add 4 new tests for language attribute functionality

Addresses converter gaps #10 (Document Language Attribute) and #11 (Foreign Text Spans)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant