Skip to content

fix(web): vendor cdnjs assets so the SPA works offline (#693 part 1)#706

Merged
memtomem merged 3 commits intomainfrom
fix/web-vendor-cdnjs
May 2, 2026
Merged

fix(web): vendor cdnjs assets so the SPA works offline (#693 part 1)#706
memtomem merged 3 commits intomainfrom
fix/web-vendor-cdnjs

Conversation

@memtomem
Copy link
Copy Markdown
Owner

@memtomem memtomem commented May 2, 2026

Summary

The mm web SPA pulled DOMPurify, marked, and Prism (core + 5 language plugins + the tomorrow theme) directly from cdnjs.cloudflare.com on every page load. Three problems land here at once:

  • Offline / firewalled / air-gapped deployments — a normal home for a local memory tool — silently lost markdown rendering, HTML sanitization, and code highlighting because the SPA loads but the script tags fail.
  • Privacy — every mm web page load was a beacon to cdnjs.cloudflare.com carrying the visitor's IP, User-Agent, and request time, even though the SPA is meant to be a 100% local tool.
  • Trust-on-first-use — there were no integrity= hashes, so a cdnjs compromise would have served arbitrary code into the same origin as the local /api/... surface.

This PR vendors all nine assets under web/static/vendor/, tightens the CSP, and adds regression guards. The StaticFiles mount at / already serves them at /vendor/... so no new route was needed.

Closes part 1 of #693. Swagger UI vendoring (/api/docs → jsdelivr) is intentionally split off to a follow-up PR — different audience (dev-facing), different CDN, different fix shape (get_swagger_ui_html override).

What changed

  • Vendored 9 files at pinned versions (DOMPurify 3.1.6, marked 9.1.6, Prism 1.29.0):
    • purify.min.js, marked.min.js, prism.min.js
    • prism-{python,typescript,json,bash,yaml}.min.js
    • prism-tomorrow.min.css
  • Rewrote 9 refs in index.html from https://cdnjs.cloudflare.com/... to /vendor/...?v=1.
  • Tightened CSP in web/app.py SecurityHeadersMiddleware — dropped https://cdnjs.cloudflare.com from script-src and style-src. The browser can no longer be talked into loading code from the CDN even if the HTML were poisoned. Final policy:
    default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline';
    img-src 'self' data:; connect-src 'self'; frame-ancestors 'none'
    
  • Attribution & licensing:
    • vendor/THIRD_PARTY_LICENSES.md — version pin table + source URLs + SHA-256 per file.
    • vendor/dompurify-LICENSE.txt, vendor/marked-LICENSE.md, vendor/prism-LICENSE.txt — verbatim upstream LICENSE files (DOMPurify Apache-2.0/MPL-2.0, marked MIT, Prism MIT).
    • vendor/README.md — update process: curl URL list → shasum (with explicit supply-chain check on same-version re-fetch) → license refresh → ?v=N cache-bust → smoke-test checklist.
  • Regression guards (tests/test_web_csp_vendor.py):
    • Paired positive + negative CSP assertion (per feedback_pin_invert_symmetric_assertion.md) — pins script-src 'self'; so a header-drop fails loudly, and blocks cdnjs, jsdelivr, unpkg, plus the generic script-src 'self' https shape that would smuggle in any external host.
    • Parametrized /vendor/<name> 200-and-non-empty check across all nine assets.
    • index.html grep guard: no external CDN strings reintroduced; every vendored asset is referenced.
  • Footprint: ~89 KB minified across the nine assets.

Why vendor and not npm

The package has no JavaScript build pipeline today (raw .js/.css ship in the wheel). Adding npm install purely to pin nine browser libraries would force every contributor and uv tool install user via sdist to also have Node available. Direct vendoring of cdnjs builds keeps the install surface Python-only — see vendor/README.md rationale section.

Test plan

  • uv run ruff check packages/memtomem/src && uv run ruff format --check packages/memtomem/src → clean
  • uv run pytest packages/memtomem/tests/test_web_csp_vendor.py → 12/12 passed
  • uv run pytest packages/memtomem/tests -m "not ollama" -k "web or app or fastapi or static" → 646 passed
  • Browser smoke via Playwright MCP — mm web on isolated port, navigate, check:
    • 0 console errors / warnings
    • All 9 /vendor/...?v=1 requests return 200
    • No request to cdnjs.cloudflare.com or any external CDN
    • DOMPurify.version === '3.1.6', DOMPurify.sanitize('<img src=x onerror=alert(1)>') strips onerror
    • marked.parse('# Hi\n\n\``python\nprint(1)\n```')renders withlanguage-python` class
    • Prism.languages includes python, typescript, json, bash, yaml
    • CSP response header matches the tightened policy

🤖 Generated with Claude Code

pandas-studio and others added 2 commits May 2, 2026 13:49
The Web UI shipped with mm web pulled DOMPurify, marked, and Prism
(core + 5 language components + the tomorrow theme) directly from
cdnjs.cloudflare.com on every page load. Offline / air-gapped /
firewalled deployments — a normal home for a local memory tool —
silently lost markdown rendering, HTML sanitization, and code
highlighting because the SPA loads but the script tags fail. There were
no SRI hashes either, so a cdnjs compromise was trust-on-first-use.

Vendor pinned copies of all nine files under web/static/vendor/. The
StaticFiles mount at "/" already serves them at /vendor/... so no new
route is needed. Each file is recorded in vendor/THIRD_PARTY_LICENSES.md
with version, source URL, license, and SHA-256, and each upstream
LICENSE is reproduced verbatim alongside the assets (DOMPurify
Apache-2.0/MPL-2.0, marked MIT, Prism MIT). vendor/README.md documents
the update procedure (curl + sha + ?v= bump).

The Content-Security-Policy in SecurityHeadersMiddleware no longer needs
to allow-list cdnjs.cloudflare.com — script-src and style-src tighten
back to 'self' (style-src keeps 'unsafe-inline' for existing inline
styles). This is a security side-benefit: the browser can no longer be
talked into loading code from the CDN even if the HTML were poisoned.

Smoke-tested in a real browser via Playwright MCP: 0 console
errors/warnings, all nine /vendor/ requests 200, marked.parse() and
DOMPurify.sanitize() and Prism.highlight() all functional, CSP header
emitted as default-src 'self'; script-src 'self'; style-src 'self'
'unsafe-inline'; img-src 'self' data:; connect-src 'self';
frame-ancestors 'none'.

Closes part 1 of #693. Vendoring of the FastAPI Swagger UI bundle
(jsdelivr) at /api/docs is deliberately split off to a follow-up — that
asset is dev-facing and is loaded from a different CDN with a different
fix shape (FastAPI's get_swagger_ui_html override).

Co-Authored-By: Claude <[email protected]>
Followup to the previous commit per review on #706. The vendor swap and
CSP tightening were verified manually via Playwright MCP, but nothing in
the CI suite would catch a future regression — for example, a well-meaning
"let's just allow-list the CDN again" CSP edit, or a vendor-rename that
forgets to update index.html.

Adds tests/test_web_csp_vendor.py:

* ``test_csp_locks_script_src_to_self_no_external_cdn`` — paired
  positive/negative assertions on the response CSP header. Positive marker
  pins ``script-src 'self';`` so a future header-drop fails loudly; the
  negative half blocks ``cdnjs.cloudflare.com``, ``jsdelivr``, ``unpkg``,
  and the generic ``script-src 'self' https`` shape that smuggles in any
  external host. Pattern reference:
  ``feedback_pin_invert_symmetric_assertion.md`` — a negative-only check
  would false-pass if the header were dropped entirely.
* ``test_vendor_asset_served_locally`` — parametrized over all nine
  vendored files, asserts ``/vendor/<name>`` returns 200 with non-empty
  body. Pairs with the index.html grep guard.
* ``test_index_html_has_no_external_cdn_refs`` — bare-string scan of the
  shipped index.html for ``cdnjs.cloudflare.com``, ``cdn.jsdelivr.net``,
  ``unpkg.com``.
* ``test_index_html_references_every_vendor_asset`` — inverse of the
  deletion case: every entry in the asset list must appear as
  ``/vendor/<name>`` in index.html.

Also clarifies vendor/README.md step 1: a same-version curl re-fetch must
produce the SHA-256 already pinned in THIRD_PARTY_LICENSES.md byte-for-byte.
A mismatch on a non-bump is a supply-chain red flag (cdnjs serving a
different build under the same version path) — investigate upstream
before any commit.

Co-Authored-By: Claude <[email protected]>
The two new index.html grep guards in tests/test_web_csp_vendor.py used
Path.read_text() with no encoding argument, so they relied on
locale.getpreferredencoding(). On Windows runners that resolves to
cp1252, which has no mapping for byte 0x8f — the file ships UTF-8
em-dashes in the dev-mode banner copy ("Maintainer mode — all pages..."),
so the read raised UnicodeDecodeError before the assertion ever ran. Mac
and Ubuntu were unaffected because their default is utf-8.

Pin encoding="utf-8" on both reads. The test file itself has no other
locale dependency.

Note for reviewers: the same Windows job has ~119 other pre-existing
failures unrelated to this PR (path-separator, runneradmin home dir,
config signature, wiki override paths, etc.). Those reproduce on
``main`` HEAD as well and are out of scope here — only the two
UnicodeDecodeError lines were added by the previous commit on this PR.

Co-Authored-By: Claude <[email protected]>
@memtomem memtomem merged commit 49505f4 into main May 2, 2026
8 of 9 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators May 2, 2026
@memtomem memtomem deleted the fix/web-vendor-cdnjs branch May 2, 2026 05:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants