Skip to content

sec(privacy): apply trust-boundary redaction guard to every user-driven write surface#784

Merged
memtomem merged 4 commits intomainfrom
fix/security-write-redaction-unification
May 5, 2026
Merged

sec(privacy): apply trust-boundary redaction guard to every user-driven write surface#784
memtomem merged 4 commits intomainfrom
fix/security-write-redaction-unification

Conversation

@memtomem
Copy link
Copy Markdown
Owner

@memtomem memtomem commented May 4, 2026

Summary

  • Centralise the trust-boundary redaction guard in privacy.enforce_write_guard(content, surface=…, force_unsafe=False, audit_context=…) and apply it at every user-driven write surface that previously bypassed the mem_add-only check.
  • Surfaces unified: MCP mem_edit; Web POST /api/add, POST /api/upload, PATCH /api/chunks/{id}, POST /api/scratch/{key}/promote; CLI mm add, mm shell add, mm agent share; LangGraph MemtomemStore.add(). The existing mem_add / mem_batch_add paths refactor to call the helper while keeping their per-tool counter shape.
  • Web mutating routes return HTTP 403 with {detail: "redaction_blocked", hits: N, surface: …} so the SPA can render a confirm-and-retry dialog before re-submitting with force_unsafe=true. CLI surfaces add a --force-unsafe flag with the same audit-logged semantics.

Why

docs/reports/security-hardening-plan-2026-05-05.md (High #1) and an independent multi-agent scan both flagged that the same SQLite database could end up holding cleartext provider tokens depending on which route landed the write — mem_add is gated, but req.content / new_content / entry["value"] flowed straight into append_entry() / replace_chunk_body() everywhere else. This breaks the "STM-bypass is not safety-bypass" invariant the privacy module's docstring spells out.

What this PR does NOT change

  • mem_batch_add keeps its inline scan + per-entry counter loop. The helper's single-content shape would obscure the transactional "one hit rejects the whole batch" rule that test_memory_crud_redaction.py already pins; the existing tests stay authoritative there.
  • indexing/importers.py and indexing/url_fetcher.py write external file / URL content, not user-driven content. Their trust model is different (operator-authorised indexing pipeline, not direct user submission) — they get their own follow-up rather than being silently swept under this PR's scope.
  • PII patterns are still excluded from DEFAULT_PATTERNS by design (see the privacy.py module docstring); the guard's coverage stays secret-class only.

Audit log shape

WARNING memtomem.privacy: redaction bypass via force_unsafe=True
  (surface=mem_edit, chunk_id='…', content_chars=…, hits=…)

Matched bytes are intentionally never rendered. audit_context is appended as , key=value pairs after surface=; counters keep working the same way (mem_add_redaction_stats snapshot includes the new by_tool keys).

Tests

  • test_privacy.py::TestEnforceWriteGuard — helper unit contract (pass / blocked / bypassed-with-audit / clean+force_unsafe still records pass / matched bytes never reach the log / audit-context renders without trailing-comma artefacts).
  • test_redaction_write_surfaces.py — wire-in for mem_edit, CLI mm add, and LangGraph MemtomemStore.add(). Each surface gets (block / bypass / clean) and asserts the counter routes to the right by_tool key.
  • test_web_routes.py — new TestAddMemoryRedaction, TestEditChunkRedaction, TestUploadRedaction, TestScratchPromoteRedaction end-to-end via ASGI client.

Existing 36-test redaction floor (test_memory_crud_redaction.py + test_privacy.py core) stays green; full pytest -m "not ollama" reports 4048 passed, 11 skipped, 0 failed. ruff check + ruff format --check clean.

Test plan

  • uv run pytest packages/memtomem/tests/test_privacy.py packages/memtomem/tests/test_memory_crud_redaction.py packages/memtomem/tests/test_redaction_write_surfaces.py packages/memtomem/tests/test_web_routes.py -q — focused redaction surface
  • uv run pytest -m "not ollama" — full CI floor
  • uv run ruff check packages/memtomem/src packages/memtomem/tests
  • uv run ruff format --check packages/memtomem/src packages/memtomem/tests
  • Manual smoke via mm web — POST /api/add with a sk-… payload returns 403; same payload with force_unsafe=true succeeds and shows up in mem_add_redaction_stats.

🤖 Generated with Claude Code

pandas-studio and others added 2 commits May 5, 2026 08:46
…en write surface

The MCP `mem_add` path enforces a secret-class redaction guard before any
filesystem write, but the surrounding ingress paths (Web `POST /api/add`,
`POST /api/upload`, `PATCH /api/chunks/{id}`, scratch promotion, MCP
`mem_edit`, CLI `mm add` / shell `add` / agent share, LangGraph adapter)
were writing `req.content` / `new_content` / `entry["value"]` directly.
The same SQLite database could end up holding cleartext provider tokens
depending on which route the agent or user came in through.

Centralise the scan-decide-log shape in
`privacy.enforce_write_guard(content, surface=…, force_unsafe=False,
audit_context=…)` returning a `WriteGuardResult` (`pass` /
`blocked` / `bypassed`) and apply it at every gap surface. Each surface
records under its own `by_tool` key so attribution stays observable;
audit-log lines render `audit_context` as `key=value` pairs after
`surface=` and never echo matched bytes. Web mutating routes return
HTTP 403 with `{detail: "redaction_blocked", hits: N, surface: …}` so
the SPA can surface a confirm-and-retry UX before re-sending with
`force_unsafe=true`.

`mem_batch_add` keeps its inline scan + per-entry counter pattern —
the helper's single-content shape would obscure the transactional
"one hit rejects the whole batch" rule that test_memory_crud_redaction
already pins.

Out of scope (follow-up):
- `indexing/importers.py` and `indexing/url_fetcher.py` write external
  file/URL content rather than user-driven content; the trust model is
  different (operator-authorised indexing, not direct user submission)
  and merits its own decision pass.

Tests:
- `test_privacy.py::TestEnforceWriteGuard` — helper contract
  (pass / blocked / bypassed-with-audit / clean+force-unsafe still
  records pass / matched bytes never reach the log).
- `test_redaction_write_surfaces.py` — `mem_edit`, CLI `mm add`, and
  the LangGraph adapter all run the guard and route the counter to
  the right `by_tool` key.
- `test_web_routes.py` — new `TestAddMemoryRedaction`,
  `TestEditChunkRedaction`, `TestUploadRedaction`,
  `TestScratchPromoteRedaction` cover all four web mutating routes
  (block / bypass / clean-pass) end-to-end via ASGI client.

Existing 36-test redaction floor + 4048-test full suite stay green;
ruff check + format clean.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Codex review of #784 flagged that ``enforce_write_guard``'s audit log
renders ``audit_context`` values verbatim with ``repr()``. Several
callers pass user-controllable strings — ``file`` (mem_add /
web_api_add), ``filename`` (web_api_upload), ``key`` (scratch promote),
``chunk_id`` — and a bypass on a request whose path / filename / key
embeds the same secret as the body would surface that secret in
``server.log`` even though the matched bytes from ``content`` never do.
The "matched bytes must never reach logs" goal must hold for context
fields too, not just the body.

Add ``_sanitize_audit_value`` between caller and log emit:

- non-string values (None, int, bool) pass through untouched
- string values are re-scanned with ``DEFAULT_PATTERNS``; any hit
  replaces the whole value with the fixed marker
  ``<redacted: secret-shape>`` so operators see the field was scrubbed
  rather than missing
- clean strings longer than 200 chars are truncated to keep an abusive
  caller from ballooning the audit line

Two regression pins:

- ``test_secret_in_audit_context_value_is_redacted`` — embeds the same
  ``sk-…`` token in the body, in ``file=``, and in ``filename=``;
  asserts the matched bytes never reach the log message and the
  redaction marker takes the secret-shape value's place. Non-secret
  context fields (``namespace``, ``item_idx``) still pass through.
- ``test_long_audit_context_string_is_truncated`` — caps line length
  with the ``...(truncated)`` marker.

Codex Major #2 (SPA app.js / settings-harness.js cannot pass
``force_unsafe`` on the structured 403 path) is intentionally a
follow-up: backend currently fails closed (HTTP 403, no write) which is
the secure direction; the SPA confirm-and-retry UX is a separate
frontend change with its own scope.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
pandas-studio and others added 2 commits May 5, 2026 09:29
…itizer

Codex re-review of #784 showed the audit-context sanitizer landed
in cd27c99 only covered ``enforce_write_guard``'s single-content
path. ``mem_batch_add`` keeps its own transactional decision shape
and its own ``logger.warning`` line (``memory_crud.py:472``), so a
secret-shaped ``file=`` argument still leaked verbatim into the
warning log on a bypass — exactly the leak class cd27c99 was meant
to close.

Extract the audit emit into a public ``privacy.emit_bypass_audit``
seam:

- Same line shape as before
  (``redaction bypass via force_unsafe=True (surface=…, …,
  content_chars=…, hits=…)``).
- Each ``audit_context`` value runs through ``_sanitize_audit_value``
  before ``%r`` formatting, so a secret embedded in a user-controlled
  string field (``namespace`` / ``file`` / ``filename`` / ``key`` /
  ``chunk_id``) gets replaced with ``<redacted: secret-shape>``
  regardless of which entry point logged it.
- Counter (``record(...)``) is intentionally NOT bumped by the
  emitter; ``mem_batch_add`` still owns its per-item bookkeeping
  contract that ``test_memory_crud_redaction.py`` already pins.

``enforce_write_guard`` now delegates to ``emit_bypass_audit``, and
``mem_batch_add`` calls it directly with the same audit_context
shape (``namespace`` / ``file`` / ``item_idx``).

New regression
``TestMemBatchAddRedactionGuard.test_secret_shaped_file_argument_is_redacted_in_audit_log``
reproduces Codex's manual repro: feed a secret-matching value plus a
``file=…sk-…`` filename into a force_unsafe batch and assert the
warning line never contains the matched-bytes prefix while still
carrying the redaction marker.

Stale-comment Minor (``app.js:3730`` / ``:3763``) lands separately as
#786 on main; out of scope here.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The shared sanitizer wired up in 18208bd carried ``hits=1`` for every
batch bypass, because the pre-scan loop only stored ``hit_indices``
without keeping the per-item ``len(privacy.scan(value))``. The single-
content path (``enforce_write_guard``) reports the real count, so
operators reading the audit log saw inconsistent ``hits=`` values
across surfaces (Codex review of PR #784, Minor #1).

Switch the pre-scan accumulator to ``hit_counts: dict[int, int]`` so
each entry that matches records its true hit count, then pass that
through ``emit_bypass_audit(..., hits=hit_counts[idx])``. The
whole-batch reject error message keeps the same shape — the indices
list is still rendered (now via ``sorted(hit_counts)``) so the user-
facing message and the existing
``test_full_reject_lists_hit_indices`` pin both still hold.

New regression
``TestMemBatchAddRedactionGuard.test_audit_hits_count_matches_scan_result``
feeds an entry that matches two patterns (``api_key:`` + ``sk-…``) and
asserts the audit line carries ``hits=<real count>``, not the
placeholder ``hits=1`` it used to.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@memtomem memtomem merged commit 3ebbc48 into main May 5, 2026
11 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators May 5, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants