fix(web): vendor cdnjs assets so the SPA works offline (#693 part 1)#706
Merged
fix(web): vendor cdnjs assets so the SPA works offline (#693 part 1)#706
Conversation
The Web UI shipped with mm web pulled DOMPurify, marked, and Prism (core + 5 language components + the tomorrow theme) directly from cdnjs.cloudflare.com on every page load. Offline / air-gapped / firewalled deployments — a normal home for a local memory tool — silently lost markdown rendering, HTML sanitization, and code highlighting because the SPA loads but the script tags fail. There were no SRI hashes either, so a cdnjs compromise was trust-on-first-use. Vendor pinned copies of all nine files under web/static/vendor/. The StaticFiles mount at "/" already serves them at /vendor/... so no new route is needed. Each file is recorded in vendor/THIRD_PARTY_LICENSES.md with version, source URL, license, and SHA-256, and each upstream LICENSE is reproduced verbatim alongside the assets (DOMPurify Apache-2.0/MPL-2.0, marked MIT, Prism MIT). vendor/README.md documents the update procedure (curl + sha + ?v= bump). The Content-Security-Policy in SecurityHeadersMiddleware no longer needs to allow-list cdnjs.cloudflare.com — script-src and style-src tighten back to 'self' (style-src keeps 'unsafe-inline' for existing inline styles). This is a security side-benefit: the browser can no longer be talked into loading code from the CDN even if the HTML were poisoned. Smoke-tested in a real browser via Playwright MCP: 0 console errors/warnings, all nine /vendor/ requests 200, marked.parse() and DOMPurify.sanitize() and Prism.highlight() all functional, CSP header emitted as default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data:; connect-src 'self'; frame-ancestors 'none'. Closes part 1 of #693. Vendoring of the FastAPI Swagger UI bundle (jsdelivr) at /api/docs is deliberately split off to a follow-up — that asset is dev-facing and is loaded from a different CDN with a different fix shape (FastAPI's get_swagger_ui_html override). Co-Authored-By: Claude <[email protected]>
Followup to the previous commit per review on #706. The vendor swap and CSP tightening were verified manually via Playwright MCP, but nothing in the CI suite would catch a future regression — for example, a well-meaning "let's just allow-list the CDN again" CSP edit, or a vendor-rename that forgets to update index.html. Adds tests/test_web_csp_vendor.py: * ``test_csp_locks_script_src_to_self_no_external_cdn`` — paired positive/negative assertions on the response CSP header. Positive marker pins ``script-src 'self';`` so a future header-drop fails loudly; the negative half blocks ``cdnjs.cloudflare.com``, ``jsdelivr``, ``unpkg``, and the generic ``script-src 'self' https`` shape that smuggles in any external host. Pattern reference: ``feedback_pin_invert_symmetric_assertion.md`` — a negative-only check would false-pass if the header were dropped entirely. * ``test_vendor_asset_served_locally`` — parametrized over all nine vendored files, asserts ``/vendor/<name>`` returns 200 with non-empty body. Pairs with the index.html grep guard. * ``test_index_html_has_no_external_cdn_refs`` — bare-string scan of the shipped index.html for ``cdnjs.cloudflare.com``, ``cdn.jsdelivr.net``, ``unpkg.com``. * ``test_index_html_references_every_vendor_asset`` — inverse of the deletion case: every entry in the asset list must appear as ``/vendor/<name>`` in index.html. Also clarifies vendor/README.md step 1: a same-version curl re-fetch must produce the SHA-256 already pinned in THIRD_PARTY_LICENSES.md byte-for-byte. A mismatch on a non-bump is a supply-chain red flag (cdnjs serving a different build under the same version path) — investigate upstream before any commit. Co-Authored-By: Claude <[email protected]>
The two new index.html grep guards in tests/test_web_csp_vendor.py used
Path.read_text() with no encoding argument, so they relied on
locale.getpreferredencoding(). On Windows runners that resolves to
cp1252, which has no mapping for byte 0x8f — the file ships UTF-8
em-dashes in the dev-mode banner copy ("Maintainer mode — all pages..."),
so the read raised UnicodeDecodeError before the assertion ever ran. Mac
and Ubuntu were unaffected because their default is utf-8.
Pin encoding="utf-8" on both reads. The test file itself has no other
locale dependency.
Note for reviewers: the same Windows job has ~119 other pre-existing
failures unrelated to this PR (path-separator, runneradmin home dir,
config signature, wiki override paths, etc.). Those reproduce on
``main`` HEAD as well and are out of scope here — only the two
UnicodeDecodeError lines were added by the previous commit on this PR.
Co-Authored-By: Claude <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
mm webSPA pulled DOMPurify, marked, and Prism (core + 5 language plugins + the tomorrow theme) directly fromcdnjs.cloudflare.comon every page load. Three problems land here at once:mm webpage load was a beacon tocdnjs.cloudflare.comcarrying the visitor's IP, User-Agent, and request time, even though the SPA is meant to be a 100% local tool.integrity=hashes, so a cdnjs compromise would have served arbitrary code into the same origin as the local/api/...surface.This PR vendors all nine assets under
web/static/vendor/, tightens the CSP, and adds regression guards. TheStaticFilesmount at/already serves them at/vendor/...so no new route was needed.Closes part 1 of #693. Swagger UI vendoring (
/api/docs→ jsdelivr) is intentionally split off to a follow-up PR — different audience (dev-facing), different CDN, different fix shape (get_swagger_ui_htmloverride).What changed
purify.min.js,marked.min.js,prism.min.jsprism-{python,typescript,json,bash,yaml}.min.jsprism-tomorrow.min.cssindex.htmlfromhttps://cdnjs.cloudflare.com/...to/vendor/...?v=1.web/app.pySecurityHeadersMiddleware— droppedhttps://cdnjs.cloudflare.comfromscript-srcandstyle-src. The browser can no longer be talked into loading code from the CDN even if the HTML were poisoned. Final policy:vendor/THIRD_PARTY_LICENSES.md— version pin table + source URLs + SHA-256 per file.vendor/dompurify-LICENSE.txt,vendor/marked-LICENSE.md,vendor/prism-LICENSE.txt— verbatim upstream LICENSE files (DOMPurify Apache-2.0/MPL-2.0, marked MIT, Prism MIT).vendor/README.md— update process:curlURL list →shasum(with explicit supply-chain check on same-version re-fetch) → license refresh →?v=Ncache-bust → smoke-test checklist.tests/test_web_csp_vendor.py):feedback_pin_invert_symmetric_assertion.md) — pinsscript-src 'self';so a header-drop fails loudly, and blockscdnjs,jsdelivr,unpkg, plus the genericscript-src 'self' httpsshape that would smuggle in any external host./vendor/<name>200-and-non-empty check across all nine assets.index.htmlgrep guard: no external CDN strings reintroduced; every vendored asset is referenced.Why vendor and not
npmThe package has no JavaScript build pipeline today (raw
.js/.cssship in the wheel). Addingnpm installpurely to pin nine browser libraries would force every contributor anduv tool installuser via sdist to also have Node available. Direct vendoring of cdnjs builds keeps the install surface Python-only — seevendor/README.mdrationale section.Test plan
uv run ruff check packages/memtomem/src && uv run ruff format --check packages/memtomem/src→ cleanuv run pytest packages/memtomem/tests/test_web_csp_vendor.py→ 12/12 passeduv run pytest packages/memtomem/tests -m "not ollama" -k "web or app or fastapi or static"→ 646 passedmm webon isolated port, navigate, check:/vendor/...?v=1requests return 200cdnjs.cloudflare.comor any external CDNDOMPurify.version === '3.1.6',DOMPurify.sanitize('<img src=x onerror=alert(1)>')stripsonerrormarked.parse('# Hi\n\n\``python\nprint(1)\n```')renders withlanguage-python` classPrism.languagesincludespython,typescript,json,bash,yaml🤖 Generated with Claude Code