Add incremental annotation overlay API to avoid DOCX re-conversion#107
Merged
Add incremental annotation overlay API to avoid DOCX re-conversion#107
Conversation
Decouple HTML conversion from annotation projection to avoid full WASM re-conversion when annotations change. New API enables: - ProjectAnnotationsOntoHtml: overlay annotations on cached HTML - AddAnnotationToHtml: add single annotation without re-conversion - RemoveAnnotationFromHtml: remove annotation by ID, preserving text - GenerateVisibilityCss: CSS-based label toggling without re-rendering - GenerateAnnotationCssString: independent CSS generation All methods available across .NET, WASM (JSExport), and npm/TS layers. https://claude.ai/code/session_01EQQ8N9xQoSSogqhsXWn3sF
The offset-based annotation creation was fragile when document content text differed from the raw input string. Using CreateAnnotationFromSearch and separate paragraphs ensures reliable text matching. https://claude.ai/code/session_01EQQ8N9xQoSSogqhsXWn3sF
After projecting the first annotation (e.g. wrapping "Alpha" with a label
span containing "A"), the text map was rebuilt but htmlText was not. The
label text "A" shifted all subsequent offsets, causing the second annotation
("Beta") to wrap the wrong character range.
Fix: GetTextNodes now skips already-projected annotation wrappers (elements
with data-annotation-id), and both textMap and htmlText are rebuilt each
iteration. This prevents label text from polluting the offset calculation
for subsequent annotations.
…rsion Measures and reports wall-clock time for: - Full DOCX → HTML with external annotations (re-parses DOCX every time) - Incremental projection (convert DOCX once, project annotations on cached HTML) - Single annotation add/remove on existing HTML Reports honest results — no assertion that one is faster than the other. The numbers tell the truth.
The benchmark test now asserts that incremental projection, single add, and single remove are all faster than full DOCX re-conversion. The exact speedup doesn't matter — just that it's faster.
C# JSON serializer produces PascalCase (Content, LabelledText, TextLabels). Handle both casings so the test works regardless of serialization convention.
- New architecture doc: docs/architecture/incremental_annotation_overlay.md covering problem statement, architecture, all API surfaces (.NET/WASM/TS), text-search-based projection, offset-drift fix, performance benchmarks, usage examples, and limitations - Updated docs/npm-package.md with External Annotations section covering all five TypeScript functions, parameter tables, code examples, and performance comparison callout - Updated CLAUDE.md with ExternalAnnotationProjector module description in the Core Modules section
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements an incremental annotation overlay API that decouples HTML conversion from annotation projection. This allows annotations to be added, removed, or modified on already-converted HTML without requiring full DOCX re-conversion, significantly improving performance for annotation-heavy workflows.
Key Changes
Core API Methods (C#)
ProjectAnnotationsOntoHtml()- Projects a complete annotation set onto pre-converted HTMLAddAnnotationToHtml()- Adds a single annotation to existing HTML with optional label stylingRemoveAnnotationFromHtml()- Removes annotations by ID, unwrapping spans back to plain textGenerateVisibilityCss()- Generates CSS to hide/show annotations by label ID without re-renderingGenerateAnnotationCss()- Generates annotation styling CSS independently from HTML contentTypeScript/JavaScript Bindings
ExternalAnnotationProjectionSettingsandAnnotationLabeltypesImplementation Details
BuildAnnotationCssString()to enable CSS-only generationAddSingleAnnotationCss()helper for per-annotation styling in incremental workflowsSerializeError()Testing
Type System Updates
CssResponsetype for CSS generation endpointsDocxodusJsonContextto serialize new response typesDocxodusWasmExportsinterface with new method signaturesWorkflow Benefits
Users can now:
This is particularly valuable for collaborative annotation scenarios where multiple users modify annotations on the same document.
https://claude.ai/code/session_01EQQ8N9xQoSSogqhsXWn3sF