web/upload: redaction-blocked retry duplicates clean files in mixed batches

## Background

PR #802 wired the SPA confirm-and-retry UX for the trust-boundary redaction
guard introduced in PR #784. For `/api/upload` the helper
`uploadFilesWithRedactionRetry()` (`packages/memtomem/src/memtomem/web/static/app.js`)
re-issues the **original `FormData`** with `?force_unsafe=true` after the
user confirms the bypass.

The PR comment notes the trade-off:

> Re-uploading already-OK files is idempotent thanks to the server's
> `_{mtime_ns}` collision suffix (system.py:1121). Acceptable trade-off
> given the rarity of mixed batches.

That framing is too generous: in **every** mixed batch (≥1 clean + ≥1
blocked file) the clean files are silently indexed twice — once on the
first pass, once on the retry under a different filename — not just in
rare cases.

## Reproduction

1. `mm web` → Index → Upload mode.
2. Select two files in the same batch:
   - `clean.md` — plain prose, no secret-like patterns.
   - `secret.md` — contains a token matching the redaction patterns
     (e.g. `sk-ant-…`).
3. Click Upload.
4. First response: `clean.md` indexed and written to disk; `secret.md`
   returns `error="redaction_blocked (hits=N)"`.
5. Confirm the bypass dialog.
6. Inspect the upload directory: `clean.md` exists twice — once under its
   original name, once under `clean_{mtime_ns}.md`. Both are indexed.

## Root cause

`uploadFilesWithRedactionRetry()` posts the unmodified original
`FormData` on retry; the helper has no awareness of which entries were
blocked vs which were already persisted. The server's `_{mtime_ns}`
suffix in `system.py:1121` was designed for genuine name collisions, not
as a deduplication mechanism for retry batches.

## Fix options

1. **Client-side scope reduction** (preferred):
   In `uploadFilesWithRedactionRetry()`, after parsing per-file errors,
   rebuild a new `FormData` containing only the blocked files (matched by
   `filename` against the per-file error rows) and POST that to the
   `?force_unsafe=true` URL. Clean files remain written from the first
   pass; the retry only adds the bypassed ones.

2. **Server-side skip-if-written**:
   Add a check in the upload handler that, when `force_unsafe=true` and a
   file with the same content hash / mtime is already indexed, treats it
   as a no-op rather than rewriting under `_{mtime_ns}`. Wider blast
   radius and changes existing collision semantics; only worth it if
   another caller hits the same problem.

Option 1 is local to the SPA and matches how a thoughtful user would
interpret "retry the bypass" (retry the *blocked* part, not the whole
batch).

## Test plan

- New Playwright spec in `tests/web/test_redaction_blocked_retry.py`:
  mixed batch (1 clean + 1 blocked) → first pass → confirm → assert the
  retry `FormData` contains **only** the blocked filename.
- Manual `mm web` smoke: confirm only the blocked file appears in the
  upload directory after retry; clean file is not duplicated.

## Out of scope

- Single-blocked-file batches (already covered by PR #802's existing
  spec).
- JSON routes (`/api/add`, `/api/chunks/{id}`, scratch promote) — these
  return a structured 403 before any write happens, so the duplicate
  pattern doesn't apply.

Refs PR #802, PR #784, issue #785.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

web/upload: redaction-blocked retry duplicates clean files in mixed batches #803

Background

Reproduction

Root cause

Fix options

Test plan

Out of scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

web/upload: redaction-blocked retry duplicates clean files in mixed batches #803

Description

Background

Reproduction

Root cause

Fix options

Test plan

Out of scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions