Background
PR #802 wired the SPA confirm-and-retry UX for the trust-boundary redaction
guard introduced in PR #784. For /api/upload the helper
uploadFilesWithRedactionRetry() (packages/memtomem/src/memtomem/web/static/app.js)
re-issues the original FormData with ?force_unsafe=true after the
user confirms the bypass.
The PR comment notes the trade-off:
Re-uploading already-OK files is idempotent thanks to the server's
_{mtime_ns} collision suffix (system.py:1121). Acceptable trade-off
given the rarity of mixed batches.
That framing is too generous: in every mixed batch (≥1 clean + ≥1
blocked file) the clean files are silently indexed twice — once on the
first pass, once on the retry under a different filename — not just in
rare cases.
Reproduction
mm web → Index → Upload mode.
- Select two files in the same batch:
clean.md — plain prose, no secret-like patterns.
secret.md — contains a token matching the redaction patterns
(e.g. sk-ant-…).
- Click Upload.
- First response:
clean.md indexed and written to disk; secret.md
returns error="redaction_blocked (hits=N)".
- Confirm the bypass dialog.
- Inspect the upload directory:
clean.md exists twice — once under its
original name, once under clean_{mtime_ns}.md. Both are indexed.
Root cause
uploadFilesWithRedactionRetry() posts the unmodified original
FormData on retry; the helper has no awareness of which entries were
blocked vs which were already persisted. The server's _{mtime_ns}
suffix in system.py:1121 was designed for genuine name collisions, not
as a deduplication mechanism for retry batches.
Fix options
-
Client-side scope reduction (preferred):
In uploadFilesWithRedactionRetry(), after parsing per-file errors,
rebuild a new FormData containing only the blocked files (matched by
filename against the per-file error rows) and POST that to the
?force_unsafe=true URL. Clean files remain written from the first
pass; the retry only adds the bypassed ones.
-
Server-side skip-if-written:
Add a check in the upload handler that, when force_unsafe=true and a
file with the same content hash / mtime is already indexed, treats it
as a no-op rather than rewriting under _{mtime_ns}. Wider blast
radius and changes existing collision semantics; only worth it if
another caller hits the same problem.
Option 1 is local to the SPA and matches how a thoughtful user would
interpret "retry the bypass" (retry the blocked part, not the whole
batch).
Test plan
- New Playwright spec in
tests/web/test_redaction_blocked_retry.py:
mixed batch (1 clean + 1 blocked) → first pass → confirm → assert the
retry FormData contains only the blocked filename.
- Manual
mm web smoke: confirm only the blocked file appears in the
upload directory after retry; clean file is not duplicated.
Out of scope
Refs PR #802, PR #784, issue #785.
Background
PR #802 wired the SPA confirm-and-retry UX for the trust-boundary redaction
guard introduced in PR #784. For
/api/uploadthe helperuploadFilesWithRedactionRetry()(packages/memtomem/src/memtomem/web/static/app.js)re-issues the original
FormDatawith?force_unsafe=trueafter theuser confirms the bypass.
The PR comment notes the trade-off:
That framing is too generous: in every mixed batch (≥1 clean + ≥1
blocked file) the clean files are silently indexed twice — once on the
first pass, once on the retry under a different filename — not just in
rare cases.
Reproduction
mm web→ Index → Upload mode.clean.md— plain prose, no secret-like patterns.secret.md— contains a token matching the redaction patterns(e.g.
sk-ant-…).clean.mdindexed and written to disk;secret.mdreturns
error="redaction_blocked (hits=N)".clean.mdexists twice — once under itsoriginal name, once under
clean_{mtime_ns}.md. Both are indexed.Root cause
uploadFilesWithRedactionRetry()posts the unmodified originalFormDataon retry; the helper has no awareness of which entries wereblocked vs which were already persisted. The server's
_{mtime_ns}suffix in
system.py:1121was designed for genuine name collisions, notas a deduplication mechanism for retry batches.
Fix options
Client-side scope reduction (preferred):
In
uploadFilesWithRedactionRetry(), after parsing per-file errors,rebuild a new
FormDatacontaining only the blocked files (matched byfilenameagainst the per-file error rows) and POST that to the?force_unsafe=trueURL. Clean files remain written from the firstpass; the retry only adds the bypassed ones.
Server-side skip-if-written:
Add a check in the upload handler that, when
force_unsafe=trueand afile with the same content hash / mtime is already indexed, treats it
as a no-op rather than rewriting under
_{mtime_ns}. Wider blastradius and changes existing collision semantics; only worth it if
another caller hits the same problem.
Option 1 is local to the SPA and matches how a thoughtful user would
interpret "retry the bypass" (retry the blocked part, not the whole
batch).
Test plan
tests/web/test_redaction_blocked_retry.py:mixed batch (1 clean + 1 blocked) → first pass → confirm → assert the
retry
FormDatacontains only the blocked filename.mm websmoke: confirm only the blocked file appears in theupload directory after retry; clean file is not duplicated.
Out of scope
spec).
/api/add,/api/chunks/{id}, scratch promote) — thesereturn a structured 403 before any write happens, so the duplicate
pattern doesn't apply.
Refs PR #802, PR #784, issue #785.