Skip to content

refactor(stm): decouple surfacing via remote-only MCP#2

Merged
tsdata merged 3 commits intomainfrom
feat/stm-decouple-via-mcp
Apr 9, 2026
Merged

refactor(stm): decouple surfacing via remote-only MCP#2
tsdata merged 3 commits intomainfrom
feat/stm-decouple-via-mcp

Conversation

@tsdata
Copy link
Copy Markdown
Collaborator

@tsdata tsdata commented Apr 9, 2026

Summary

  • STM의 LTM 접근 path를 두 가지(in-process import / mcp_client)에서 remote-only via MCP 프로토콜 단일 path로 통합
  • STM이 core 내부 심볼(SearchPipeline, SqliteBackend, extract_entities, create_components, Mem2MemConfig)에 직접 의존하던 부분을 모두 제거 — 이제 McpClientSearchAdapter만이 LTM 통신 통로
  • langchain + langgraph 패턴으로 향후 별도 저장소 분리 (memtomem-stm) 사전 작업

Why

STM은 본질적으로 MCP 프록시 게이트웨이입니다 (ProxyManager가 임의 개수 upstream MCP server를 받음 — brave-search/github/memtomem 등). 이전엔 memtomem만 in-process import로 특별 취급했는데, 이는 STM이 memtomem-server 역할까지 겸하는 architectural mismatch였습니다. remote-only로 단일화하면:

  • STM의 core 의존이 MCP 프로토콜 한 점으로 좁혀져 core 내부 리팩토링이 STM을 깨뜨리지 않음
  • memtomem 크래시 시에도 STM의 다른 upstream(brave-search 등)은 정상 동작 (fault isolation)
  • 향후 git filter-repo로 별도 저장소 분리할 때 cross-repo dependency가 깔끔

Changes

Removed:

  • SurfacingEngine.search_pipeline / storage 생성자 인자 (이제 mcp_adapter keyword-only required)
  • SurfacingConfig.ltm_mode 필드 (remote가 유일한 모드)
  • server.py의 in-process bootstrap 분기 + try/except fallback 로직
  • _extract_heuristicmemtomem.tools.entity_extraction import (빈 list stub)
  • TestFeedbackBoost 테스트 클래스 3개 (SqliteBackend.increment_access 직접 호출에 의존)
  • _boosted_surfacings 추적 set (access boost 분기와 함께)

Added:

  • tests/_fake_memtomem_server.py — 가벼운 stdio MCP fake server (32줄)
  • tests/test_remote_ltm_integration.py — end-to-end integration test, fake server에 child process로 연결해서 SurfacingEngine 동작 검증 (0.36s)

Updated:

  • README.md "LTM Connection Modes" → "LTM Connection" (단일 모드 설명)
  • 6개 테스트 파일에서 _make_search_pipeline_make_mcp_adapter rename, search_pipeline=mcp_adapter= 일괄 전환
  • ruff --fix로 정리된 unused imports

Follow-ups (별도 PR)

이 PR은 in-process 전용 부수 기능 3개를 임시 비활성화합니다. 0.1.0 알파에서 사용자 거의 없으므로 안전하지만 후속 작업 필요:

  1. Access boost on "helpful" feedback — core에 mem_increment_access mem_do action을 추가한 후, McpClientSearchAdapter.increment_access(ids) 메서드로 복원
  2. Session context (working memory) injectionMcpClientSearchAdapter.scratch_list() 메서드 추가 (mem_scratch_get(key=None) 호출 + 파싱)
  3. Heuristic fact extraction fallback — STM 자체 정규식 휴리스틱(URL/식별자/명사구) 작성 (현재는 빈 list stub)

Test plan

  • STM 단독 테스트 731 passing (uv run pytest packages/memtomem-stm/tests/ -m \"not ollama\")
  • core 단독 테스트 857 passing (regression 0)
  • 합쳐서 1588 passing
  • 새 integration test 통과 (0.36s)
  • STM source ruff clean (uv run ruff check packages/memtomem-stm/src)
  • 빈 venv에서 pip install memtomem-stmmemtomem-stm 기동 (Phase 4 PyPI 배포 시)

Phase roadmap

이 PR은 4단계 분리 작업의 Phase 1입니다:

  1. Phase 1 (이 PR): STM 코드 리팩토링 — in-process 분기 제거, mcp_adapter 단일 path
  2. Phase 2: core 잔여물 정리 — cli/stm_cmd.py, StmProxyConfig, web/routes/proxy.py, lifespan.py STM bootstrapping 제거
  3. Phase 3: git filter-repopackages/memtomem-stm/ → 별도 저장소(https://github.com/memtomem/memtomem-stm.git) 추출, 모노레포에서 디렉터리 제거
  4. Phase 4: 양쪽 PyPI 배포, lockstep release, compat range

🤖 Generated with Claude Code

tsdata and others added 3 commits April 9, 2026 19:23
STM previously supported two LTM access paths — in-process (direct
SearchPipeline/SqliteBackend import) and mcp_client (stdio MCP). The
dual path tightly coupled STM to core internals and conflicted with
STM's role as an MCP proxy gateway, where memtomem is just another
upstream alongside brave-search/github/etc.

Collapse to a single remote-only path so STM depends only on the MCP
protocol contract, not core internals. This sets up the eventual repo
split (langgraph pattern: memtomem-stm becomes a separate package that
spawns memtomem-server as a child stdio process).

Removed:
- SurfacingEngine.search_pipeline / storage constructor args
- ltm_mode config field (remote is the only mode now)
- in-process bootstrap branch in server.py
- _extract_heuristic's import of memtomem.tools.entity_extraction
- TestFeedbackBoost (used direct SqliteBackend.increment_access)

Added:
- _fake_memtomem_server.py + test_remote_ltm_integration.py for an
  end-to-end check that exercises stdio MCP without needing core
  installed

Follow-ups (separate PRs):
- Restore access boost via a new mem_increment_access MCP action
- Restore include_session_context via McpClientSearchAdapter.scratch_list
- Replace the heuristic stub with a native STM regex extractor

Tests: 1588 passing (core 857, STM 731), 0 regressions, ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Pre-existing main-branch CI breakage that surfaced when this PR ran the
full ruff + pytest pipeline (the last 5 commits on main were all red):

- memory_crud.py:113 had an unused `chunk = await ...` assignment — the
  index_entry return value isn't read, so just await it.
- test_indexing_engine.py::TestIndexEntry's four tests call
  index_engine.index_entry, which triggers a real embedding HTTP call.
  They need the `ollama` marker so CI's `-m "not ollama"` filter skips
  them when Ollama isn't running.

These two fixes unblock CI for both this PR and main.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
CI's `ruff format --check` was failing because eight existing core
files weren't formatted (none of these were touched by this PR's
refactor; all were pre-existing main-branch dust). Running
`uv run ruff format packages/memtomem/src packages/memtomem-stm/src`
reformats them in place. Pure formatting changes only.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@tsdata tsdata merged commit 6c72b94 into main Apr 9, 2026
@tsdata tsdata deleted the feat/stm-decouple-via-mcp branch April 9, 2026 10:36
tsdata added a commit that referenced this pull request Apr 9, 2026
After Phase 1 (#2) made STM communicate with memtomem only over the MCP
protocol, the in-process glue that lived inside memtomem core became
dead code. Phase 2 removes it so the two packages can ship from
separate repositories without leftover cross-references.

Backend deletions:
- packages/memtomem/src/memtomem/cli/stm_cmd.py — `mm stm init`/`reset`
  wizard that imported memtomem_stm. Belongs to the STM package now.
- packages/memtomem/src/memtomem/web/routes/proxy.py — STM dashboard
  REST routes. Will be re-added inside the STM repo if needed.
- packages/memtomem/src/memtomem/cli/__init__.py — drop the `stm`
  subcommand registration.
- packages/memtomem/src/memtomem/config.py — drop StmProxyConfig and
  the `stm_proxy` field on Mem2MemConfig.
- packages/memtomem/src/memtomem/server/context.py — drop the
  `stm_proxy_manager` slot on AppContext.
- packages/memtomem/src/memtomem/server/lifespan.py — drop the entire
  `if config.stm_proxy.enabled` bootstrap branch (~80 lines) plus its
  cleanup section. memtomem core no longer imports memtomem_stm at all.
- packages/memtomem/src/memtomem/web/app.py — drop `proxy.router`
  registration and the matching import.

Frontend deletions:
- packages/memtomem/src/memtomem/web/static/index.html — STM tab button
  and the entire STM panel (~70 lines).
- packages/memtomem/src/memtomem/web/static/app.js — `_stmPollId` early
  declaration, the `if (tabName === 'stm')` switch arm, the STM PROXY
  DASHBOARD section (~320 lines), and the `stm-filter`/`stm-hist-prev`/
  `stm-hist-next` data-action handlers from the global delegator.

Test updates:
- test_cli.py: drop `stm` from registered subcommand list and remove
  `test_stm_help`.
- test_context_window.py: remove `TestSTMFormatter` (it imported
  memtomem_stm.surfacing.formatter — a cross-package boundary
  violation that should never have been in core tests).
- test_indexing_engine.py: ruff --fix removed unused imports surfaced
  by the indexing test changes.
- assorted other test files: ruff --fix cleaned unused imports.

Verification:
- 850 core tests pass (`uv run pytest packages/memtomem/tests -m "not ollama"`)
- ruff check + format clean for both src trees
- final grep across packages/memtomem confirms zero references to
  stm_proxy / StmProxy / stm_cmd / memtomem_stm in source or tests

Follow-up (not blocking): style.css still contains `.stm-*` rules; they
are unused but harmless. Drop them in a small later cleanup PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
tsdata added a commit that referenced this pull request Apr 9, 2026
After Phase 1 (#2) made STM communicate with memtomem only over the MCP
protocol, the in-process glue that lived inside memtomem core became
dead code. Phase 2 removes it so the two packages can ship from
separate repositories without leftover cross-references.

Backend deletions:
- packages/memtomem/src/memtomem/cli/stm_cmd.py — `mm stm init`/`reset`
  wizard that imported memtomem_stm. Belongs to the STM package now.
- packages/memtomem/src/memtomem/web/routes/proxy.py — STM dashboard
  REST routes. Will be re-added inside the STM repo if needed.
- packages/memtomem/src/memtomem/cli/__init__.py — drop the `stm`
  subcommand registration.
- packages/memtomem/src/memtomem/config.py — drop StmProxyConfig and
  the `stm_proxy` field on Mem2MemConfig.
- packages/memtomem/src/memtomem/server/context.py — drop the
  `stm_proxy_manager` slot on AppContext.
- packages/memtomem/src/memtomem/server/lifespan.py — drop the entire
  `if config.stm_proxy.enabled` bootstrap branch (~80 lines) plus its
  cleanup section. memtomem core no longer imports memtomem_stm at all.
- packages/memtomem/src/memtomem/web/app.py — drop `proxy.router`
  registration and the matching import.

Frontend deletions:
- packages/memtomem/src/memtomem/web/static/index.html — STM tab button
  and the entire STM panel (~70 lines).
- packages/memtomem/src/memtomem/web/static/app.js — `_stmPollId` early
  declaration, the `if (tabName === 'stm')` switch arm, the STM PROXY
  DASHBOARD section (~320 lines), and the `stm-filter`/`stm-hist-prev`/
  `stm-hist-next` data-action handlers from the global delegator.

Test updates:
- test_cli.py: drop `stm` from registered subcommand list and remove
  `test_stm_help`.
- test_context_window.py: remove `TestSTMFormatter` (it imported
  memtomem_stm.surfacing.formatter — a cross-package boundary
  violation that should never have been in core tests).
- test_indexing_engine.py: ruff --fix removed unused imports surfaced
  by the indexing test changes.
- assorted other test files: ruff --fix cleaned unused imports.

Verification:
- 850 core tests pass (`uv run pytest packages/memtomem/tests -m "not ollama"`)
- ruff check + format clean for both src trees
- final grep across packages/memtomem confirms zero references to
  stm_proxy / StmProxy / stm_cmd / memtomem_stm in source or tests

Follow-up (not blocking): style.css still contains `.stm-*` rules; they
are unused but harmless. Drop them in a small later cleanup PR.

Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
memtomem pushed a commit that referenced this pull request Apr 18, 2026
Reflect the Gemini-regenerated security curation (commit 5fbb47f) in
the handoff doc so a fresh session knows which Phase 2c steps remain.
Updates the phase-progress table, adds a "Security Phase 3a
measurements" subsection (21.9% drift -> H1 supported; intra-vocab
reclassifications to auth/mtls, auth/rbac, networking/tls; incident
subtopic skew), marks next-actions #2-#3 as complete, and points #4
(IDF pre-measure) at the authoritative Gemini batch directory.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
memtomem pushed a commit that referenced this pull request Apr 18, 2026
11 chunks affected, 13 events / 32 = 40.6% event-count — **Upper-
outlier band (32-45%)** realized per pre-registration, above
observability's 28.1% by 12.5 pp. Pre-registered cell; not a novel
failure mode.

Two systematic Gemini patterns drove 6/13 events (47%):
- Pattern 1: `kubectl logs` diagnostic conflated with
  `observability/logging` subtopic (B3 #2, B5 #1, B6 #2)
- Pattern 2: postmortem genre conflated with
  `incident_response/postmortem` subtopic (B7 #1, B7 #2, B8 #2)

Both are Phase 3b drift-validator "forbidden pair" candidates
(trigger condition k >= 5 topics met at k8s).

Category mix: absent-topic 11, missed secondary 2 (paired with
cross-topic drops on B1 #4 cost_opt→k8s/scaling and B8 #4
observability/metrics→k8s/scheduling), out-of-vocab 0, intra-vocab
0, over-correction 0.

Band-realization sensitivity check: chunk-count drift would be
11/32 = 34.4%, still Upper-outlier. Event-count convention does
not flip the band.

Baseline observation (n=2 event-count at observability + k8s):
drifts span 28.1% to 40.6%. "Prompt-structure drift as constant
~25-30%" hypothesis weakened — topic-level variance appears real.
H1/H2/H3 formal reformulation still deferred to kafka per (A)-
path; no promotion to working hypothesis at n=2.

Post-k8s decision pending divergence (Step 6); handoff Step 3 ✅.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
memtomem pushed a commit that referenced this pull request Apr 18, 2026
Trigger condition ("≥ 5 topics ledgered") met at k8s — implement the
three-tier drift validator against the accumulated 6-topic curation
ledger (caching baseline + postgres + cost_opt + security +
observability + k8s). Two tiers locked; greenlist deferred until a
false-positive case appears.

**Forbidden** (auto-reject, CLI exit 1):
- Closed-vocab enforcement: unknown topic, unknown subtopic, malformed
  tag. Guards against silent closed-vocab drift after the 2026-04-17
  freeze.
- `genre-postmortem-vs-ir-postmortem-subtopic`: postmortem-genre chunk
  with non-IR primary cannot tag `incident_response/postmortem` as
  secondary. k8s Pattern 2 (3 events: B7 #1, B7 #2, B8 #2).

**Manual-review** (warn, no block — curator verifies by body context):
- `kubectl-logs-diagnostic-vs-observability-logging`: k8s/* primary +
  obs/logging secondary + body uses `kubectl logs`. k8s Pattern 1
  (3 events: B3 #2, B5 #1, B6 #2).
- `security-access-control-primary-with-rbac-body`: security/access_control
  primary + body mentions RBAC-specific resources (Role, RoleBinding,
  ClusterRoleBinding). Security ledger trouble KO #1, postmortem KO #2.
- `security-encryption-primary-with-transport-body`: security/encryption
  primary + transport-layer body (TLS, mTLS, cert-manager, certbot,
  PeerAuthentication) AND secondary does not already name
  networking/tls or auth/mtls. Security ledger adr EN #1, runbook KO
  #2, trouble KO #2. Suppression clause matches "Borderline cases
  preserved" precedent (postmortem EN #3, runbook EN #1) where the
  curator already captured the split functionally.

Deliverables:
- tools/retrieval-eval/drift_validator.py (~280 lines): parser,
  rules, CLI. `uv run python tools/retrieval-eval/drift_validator.py
  <corpus_root|fixture.md>`. Corpus walker filters to genre fixtures
  only (skips README and other .md).
- packages/memtomem/tests/test_drift_validator.py: 23 tests — parser,
  each rule positive + negative, CLI exit codes, corpus-sanity
  assertions (zero forbidden, zero manual-review on current corpus).
- tools/retrieval-eval/README.md: tool entry added.
- docs/planning/b2-v2-design.md § "Drift validator": TODO converted
  to implemented, inline rule summary with ledger citations.
- docs/planning/b2-v2-phase2b-ledger.md § "Deferred decisions":
  checkbox flipped, full rule list recorded.

Validation against current corpus: zero forbidden, zero manual-review.
`ruff check` + `ruff format --check` clean, `mypy` advisory clean,
`pytest packages/memtomem/tests/test_drift_validator.py`: 23 passed.

CI wiring to run this against corpus_v2 on every PR lands at Phase 6.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
memtomem added a commit that referenced this pull request Apr 25, 2026
…on (#464)

* feat(mem_add,mem_edit): canonical writer + mem_edit header preservation

Stacks on PR #463 (chunker reader). Two coupled writer-side changes:

1. append_entry emits canonical blockquote tags

   > tags: ["a", "b"]   # explicit `> ` prefix, JSON-quoted

   instead of the legacy lazy-continuation form

   tags: ['a', 'b']     # Python repr(), no `> ` prefix

   Lazy continuation is well-defined CommonMark but easy to break
   (a single blank line ends the blockquote), and Python repr() is
   not valid YAML. The chunker's section-leading parser already
   accepts both shapes, so existing files still parse — the writer
   just stops producing the fragile form on new entries.

   New test_memory_writer_tag_format.py pins the canonical output so
   a future regression that reverts to lazy / Python-repr fails fast.

2. mem_edit preserves the per-entry header

   Pre-PR, mem_edit called replace_lines(start_line, end_line,
   new_content), which replaced the entire chunk range — heading,
   blockquote header, body — with whatever the caller supplied.
   That was already wide pre-RFC, but post-PR-A the chunker strips
   the blockquote from chunk content, so a caller working from
   chunk.content would supply body-only and silently erase the
   metadata header on disk.

   New replace_chunk_body helper detects the heading + section-
   leading blockquote group at the top of the chunk's line range and
   keeps them intact while replacing only the body. If new_content
   itself starts with `## `, the call reverts to a full replacement
   (preserving the pre-RFC mem_edit semantic for users who want to
   override the heading explicitly).

   New test_mem_edit_header_preservation.py covers six cases
   (canonical, legacy lazy, no header, full-replace via `## `,
   non-first sub-chunk of an oversized section, trailing-newline
   round-trip).

Round-trip and follow-ups:

- test_multi_agent_integration test_share_copies_chunk_with_audit_tag
  now also asserts the forward direction:
  mem_search(tag_filter="shared-from=<src>") returns the share copy
  and only the share copy. Closes the e2e gap discovered during PR
  #462 that drove RFC mem-add-tags-blockquote-promote-rfc.md.
- test_chunking_blockquote_tags adds a block-list-shape case
  (`> tags:\n> - a\n> - b`). The current writer never emits this
  shape, but the shared _parse_tags_value helper supports it and
  this locks the support in.

Implements RFC §Migration & tests #2 + #3 and the writer-PR scope
addition (mem_edit header preservation, ③-1) noted in PR #463
review.

Co-Authored-By: Claude <[email protected]>

* fix(web): preserve per-entry header in PATCH /api/chunks edit endpoint

Reviewer caught that web/routes/chunks.py:edit_chunk still called
replace_lines directly, which bypasses the header-preservation logic
mem_edit gained in this PR. Same footgun, different surface — the Web
UI editor surfaces chunk.content (already header-stripped by the
chunker), so a Save from the browser would feed body-only content into
replace_lines and silently erase the heading + > created: + > tags:
on disk.

Swap the call to replace_chunk_body so mem_edit and the Web UI route
share the same preservation contract. Add a regression test
(test_edit_chunk_preserves_blockquote_header) that drives the route
end-to-end with a real tmp-file source and asserts the metadata
header survives a body-only PATCH.

Also folded in two reviewer nits:

- mem_edit MCP-tool docstring now mentions the body-only / `## `
  bypass policy (was only documented on the underlying helper).
- replace_chunk_body has a one-line comment noting that ``## `` is the
  full-replace trigger because append_entry always emits H2; other
  heading levels in user input are body content.

grep across src/ confirms replace_lines now has zero non-test callers
in the codebase. Whether to deprecate it or keep it as a low-level
escape hatch is tracked as a follow-up; not closing that question
here.

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: pandas-studio <[email protected]>
Co-authored-by: Claude <[email protected]>
memtomem pushed a commit that referenced this pull request Apr 25, 2026
…ote header

When chunk_file strips a section-leading ``> created:`` / ``> tags:``
blockquote before passing text to _split_section, base_line still
points at the heading. line_offset in _split_section started at 0 and
counted only body lines, so sub-chunks 2..N reported start_line 3-5
lines earlier than the body they actually covered.

Practical impact: mem_edit on a non-first sub-chunk reads file lines
[start_line..end_line] via replace_chunk_body. With drift, those
lines included real body content from the previous sub-chunk —
which _find_body_start_index does not detect as a header (no
heading, no leading blockquote at the start of the range), so
preserved == [] and the replacement silently dropped that content
on save.

Fix: _extract_section_blockquote_tags returns a third value,
lines_consumed, counting how many input lines the strip consumed.
chunk_file adds it to leading_strip_lines (lines stripped by .strip()
on the raw section text) and passes the total + 1 (for the heading
line itself) as a body_offset kwarg to _split_section, which seeds
line_offset with body_offset. Sub-chunks 1..N now align with file
lines as well as the pre-strip code did. Sub-chunk 1's start_line
still anchors at the heading (line_offset is only consulted on the
first emit boundary, not at init), so mem_edit's header-preservation
contract for sub-chunk 1 is preserved.

Single-chunk sections and sections without any blockquote header are
unchanged: body_offset stays 0 in those paths.

Two new tests in test_chunking_blockquote_tags.py:
- test_blockquote_header_does_not_drag_sub_chunk_start_lines: a
  controlled oversize section with an explicit blockquote header,
  sub-chunks 2..N must report start_line >= body-start file line.
  Fails pre-fix.
- test_oversize_section_without_blockquote_unchanged: pre-PR-A path
  is preserved when there's no header to strip.

Closes RFC ``planning/mem-add-tags-blockquote-promote-rfc.md``
§Follow-ups #2.

Co-Authored-By: Claude <[email protected]>
memtomem added a commit that referenced this pull request Apr 25, 2026
…ote header (#466)

* fix(chunking): seed _split_section line counter past stripped blockquote header

When chunk_file strips a section-leading ``> created:`` / ``> tags:``
blockquote before passing text to _split_section, base_line still
points at the heading. line_offset in _split_section started at 0 and
counted only body lines, so sub-chunks 2..N reported start_line 3-5
lines earlier than the body they actually covered.

Practical impact: mem_edit on a non-first sub-chunk reads file lines
[start_line..end_line] via replace_chunk_body. With drift, those
lines included real body content from the previous sub-chunk —
which _find_body_start_index does not detect as a header (no
heading, no leading blockquote at the start of the range), so
preserved == [] and the replacement silently dropped that content
on save.

Fix: _extract_section_blockquote_tags returns a third value,
lines_consumed, counting how many input lines the strip consumed.
chunk_file adds it to leading_strip_lines (lines stripped by .strip()
on the raw section text) and passes the total + 1 (for the heading
line itself) as a body_offset kwarg to _split_section, which seeds
line_offset with body_offset. Sub-chunks 1..N now align with file
lines as well as the pre-strip code did. Sub-chunk 1's start_line
still anchors at the heading (line_offset is only consulted on the
first emit boundary, not at init), so mem_edit's header-preservation
contract for sub-chunk 1 is preserved.

Single-chunk sections and sections without any blockquote header are
unchanged: body_offset stays 0 in those paths.

Two new tests in test_chunking_blockquote_tags.py:
- test_blockquote_header_does_not_drag_sub_chunk_start_lines: a
  controlled oversize section with an explicit blockquote header,
  sub-chunks 2..N must report start_line >= body-start file line.
  Fails pre-fix.
- test_oversize_section_without_blockquote_unchanged: pre-PR-A path
  is preserved when there's no header to strip.

Closes RFC ``planning/mem-add-tags-blockquote-promote-rfc.md``
§Follow-ups #2.

Co-Authored-By: Claude <[email protected]>

* test: add no-tags blockquote oversize regression for line-drift fix

Reviewer N3: gap in coverage — `_extract_section_blockquote_tags`
short-circuits to ([], text, 0) when no `tags:` key is found, so the
blockquote stays in content and `blockquote_strip_lines` is 0. The
existing tests covered (a) tags present + strip and (b) no blockquote
at all, but not (c) blockquote present without tags.

The new case verifies for an oversize section with `> created:`
only:
  - sub-chunks inherit no tags (no promotion happened)
  - first sub-chunk's content still carries `> created:` (mem_edit
    can preserve it on body-only edits)
  - sub-chunks 2..N start_line still aligns with the body — no drift
    from the leading_strip path even when the blockquote isn't
    consumed by the strip helper

Also tracks reviewer N1 (paragraph-gap +2 imprecision in
_split_section) in the RFC follow-ups: separate from this PR; cosmetic
end_line overshoot, not data loss.

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: pandas-studio <[email protected]>
Co-authored-by: Claude <[email protected]>
memtomem pushed a commit that referenced this pull request Apr 29, 2026
… in toast, align canonical_root style

Follow-up to PR #549 review:

- **#1 (low-medium)**: drop the hardcoded `_CTX_EMPTY_HINT_META` JS map.
  GET `/api/context/{skills,commands,agents}` now also returns
  `canonical_root: str` and `scanned_dirs: list[str]` (computed once at
  module import from `SKILL_DIRS` / `AGENT_DIRS` / `COMMAND_DIRS`), so the
  empty-state hint pulls from the wire instead of duplicating the
  detector layout client-side. The JS still has a one-line fallback
  (`.memtomem/${type}` + empty list) for older backends, but it never
  fires when the cache-bust takes effect.
- **#2 (cosmetic)**: skills sync response now uses the
  `CANONICAL_SKILL_ROOT = ".memtomem/skills"` constant directly,
  matching the `CANONICAL_AGENT_ROOT` / `CANONICAL_COMMAND_ROOT` pattern
  in agents/commands routes. Drops the `_safe_rel(canonical_skills_root(
  project_root), project_root)` round-trip — both forms resolve to the
  same string today, but a constant won't drift if a future PR adds env
  / scope resolution to `canonical_skills_root()`.
- **#3 (low)**: import-no-runtimes toast renders
  `basename(project_root)` instead of the absolute path. The wire still
  carries the absolute string (consistent with `/api/system/*` and
  useful for debugging / reverse-proxy contexts) but the user-facing
  toast keeps it short — `scanned_dirs` already gives full orientation.
  New `_ctxBasename()` helper handles the POSIX path split.

Cache-bust bumped to `context-gateway.js?v=3`.

Tests:
- New assertions on `TestListSkills.test_empty`,
  `TestListCommands.test_empty`, `TestListAgents.test_empty` verify
  GET response now includes `canonical_root` + `scanned_dirs`.
- `pytest -m "not ollama"` 3152 passed; `ruff check` + `ruff format
  --check` clean.

Skipped per reviewer guidance:
- #4 `Final[str]` → `Literal` / `StrEnum` (mypy is advisory).
- #5 additional positive route assertions for `invalid_name` /
  `already_imported` / `toml_parse_error` (route layer is a trivial
  dict comprehension; covered at the core layer).
- #6 Korean `<이름>` literal angle-bracket (intentional placeholder
  guide, not an unsubstituted variable).

Co-Authored-By: Claude <[email protected]>
memtomem added a commit that referenced this pull request Apr 29, 2026
* fix(web): surface "why nothing happened" on Skills/Commands/Agents Sync+Import

Users reported "Sync 도 Import 도 동작이 이상함" — both buttons in
mm web → Settings → Skills/Commands/Agents looked successful (a green
toast popped) but nothing actually moved on disk. The root cause was a
shape mismatch in the response handling, not a wiring bug:

- For skills, the sync response carries `skipped` (e.g. "no canonical
  skills") but never `dropped`. The client only read `dropped`, so the
  skipped reason was silently dropped on the floor.
- The import response carried no metadata at all, so a 0-imported,
  0-skipped result (the common case when no `.claude/skills/` exists in
  the project) just rendered "Import completed" with no clue that the
  scanner found nothing because the directories didn't exist.

This change keeps cwd-bound single-project behavior unchanged and adds
machine-readable plumbing the UI can match against:

- All three context types' `Sync` and `Import` core layers
  (`memtomem.context.{skills,commands,agents}`) now record skipped items
  as `(name, reason, reason_code)` triples. Reason codes come from a new
  `memtomem.context._skip_reasons` module: `no_canonical_root`,
  `unknown_runtime`, `parse_error`, `invalid_name`, `already_imported`,
  `canonical_exists`, `toml_parse_error`. Human `reason` strings stay
  for CLI/log output.
- Web route handlers (`web/routes/context_{skills,commands,agents}.py`)
  surface `reason_code` per skipped item, plus `canonical_root` on sync
  responses and `project_root` + `scanned_dirs` on import responses.
- The web client (`context-gateway.js`) reads `data.skipped` (in
  addition to `data.dropped` for commands/agents field-level drops),
  matches on `reason_code === "no_canonical_root"` to show
  "No canonical {type} under {canonical}. Create one first." instead of
  a generic success, and on a 0/0 import shows "No runtime {type} found
  in {project_root}. Scanned: {scan_dirs}." so users see exactly which
  paths the detector inspected.
- The empty list state hint now points at the canonical and runtime
  paths instead of "Create one or import from existing runtimes." — a
  fresh user can drop a `SKILL.md` into the right directory without
  guessing.
- Three new placeholder-form i18n keys (`empty_hint`,
  `import_no_runtimes`, `sync_empty_canonical`) cover all three types
  via `{type}` / `{canonical}` / `{root}` / `{scan_dirs}` substitution
  rather than 9 type-specific keys.
- `context-gateway.js?v=2` cache-bust.

All response changes are additive — existing clients that only read
`imported`/`generated`/`skipped[].name|runtime|reason` keep working
unchanged. The CLI (`mm context skills/agents/commands ...`) and the
MCP `context` tool also unpack the new triple shape via `_code` discard
so their human-readable output is byte-identical.

Tests:
- 3 core test suites (`test_context_{skills,commands,agents}.py`)
  updated to the 3-tuple shape.
- New web route assertions verify `reason_code`, `canonical_root`,
  `project_root`, and `scanned_dirs` are present in sync/import
  responses for the empty-canonical and empty-runtime paths.
- Full suite green: `pytest -m "not ollama"` 3152 passed; `ruff check`
  + `ruff format --check` clean.

Co-Authored-By: Claude <[email protected]>

* fix(web/context): address review — surface scan dirs on GET, basename in toast, align canonical_root style

Follow-up to PR #549 review:

- **#1 (low-medium)**: drop the hardcoded `_CTX_EMPTY_HINT_META` JS map.
  GET `/api/context/{skills,commands,agents}` now also returns
  `canonical_root: str` and `scanned_dirs: list[str]` (computed once at
  module import from `SKILL_DIRS` / `AGENT_DIRS` / `COMMAND_DIRS`), so the
  empty-state hint pulls from the wire instead of duplicating the
  detector layout client-side. The JS still has a one-line fallback
  (`.memtomem/${type}` + empty list) for older backends, but it never
  fires when the cache-bust takes effect.
- **#2 (cosmetic)**: skills sync response now uses the
  `CANONICAL_SKILL_ROOT = ".memtomem/skills"` constant directly,
  matching the `CANONICAL_AGENT_ROOT` / `CANONICAL_COMMAND_ROOT` pattern
  in agents/commands routes. Drops the `_safe_rel(canonical_skills_root(
  project_root), project_root)` round-trip — both forms resolve to the
  same string today, but a constant won't drift if a future PR adds env
  / scope resolution to `canonical_skills_root()`.
- **#3 (low)**: import-no-runtimes toast renders
  `basename(project_root)` instead of the absolute path. The wire still
  carries the absolute string (consistent with `/api/system/*` and
  useful for debugging / reverse-proxy contexts) but the user-facing
  toast keeps it short — `scanned_dirs` already gives full orientation.
  New `_ctxBasename()` helper handles the POSIX path split.

Cache-bust bumped to `context-gateway.js?v=3`.

Tests:
- New assertions on `TestListSkills.test_empty`,
  `TestListCommands.test_empty`, `TestListAgents.test_empty` verify
  GET response now includes `canonical_root` + `scanned_dirs`.
- `pytest -m "not ollama"` 3152 passed; `ruff check` + `ruff format
  --check` clean.

Skipped per reviewer guidance:
- #4 `Final[str]` → `Literal` / `StrEnum` (mypy is advisory).
- #5 additional positive route assertions for `invalid_name` /
  `already_imported` / `toml_parse_error` (route layer is a trivial
  dict comprehension; covered at the core layer).
- #6 Korean `<이름>` literal angle-bracket (intentional placeholder
  guide, not an unsubstituted variable).

Co-Authored-By: Claude <[email protected]>

* refactor(context): narrow reason_code to Literal SkipCode

Tightens the typing on the third element of `(name, reason, reason_code)`
triples produced by `SkillSyncResult.skipped`, `CommandSyncResult.skipped`,
`AgentSyncResult.skipped`, and `ExtractResult.skipped`. The new
`memtomem.context._skip_reasons.SkipCode` is a `Literal` of the seven
codes already enumerated in that module, so a typo at the construction
site fails type-check instead of slipping through to the wire.

Also strengthens the import-empty web-route tests for skills, commands,
and agents to assert `data["project_root"] == str(tmp_path)` instead of
just key presence — confirms the response actually echoes the
fixture's project root rather than some unrelated path.

No runtime behavior change; addresses follow-up polish flagged in the
review of this PR.

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: pandas-studio <[email protected]>
Co-authored-by: Claude <[email protected]>
memtomem pushed a commit that referenced this pull request Apr 29, 2026
Self-review of PR #553 surfaced five real issues — fixing them keeps
the surface honest about what's actually wired up.

- Drop dead helpers in ``context/projects.py`` (#1):
  - ``_is_eaccess`` was never called.
  - ``is_project_scope_id`` claimed to support route error formatting
    but no route used it.
  - ``_new_temp_known_projects_path`` claimed test convenience but no
    test imported it.
  - ``errno`` and ``tempfile`` imports follow.
- Drop ``ProjectScope.counts`` dataclass field (#2). Discovery never
  populated it; the route layer (``_scope_to_dict``) always overrides
  via ``_counts_for(root)``. Forward-design pollution.
- ``POST /api/context/known-projects`` now returns ``warning_code:
  "no_runtime_marker"`` alongside the prose ``warning`` (#4). Matches
  PR1's (#549) machine-readable ``reason_code`` pattern so client
  matching is i18n-stable; the prose stays for back-compat.
- ``_counts_for`` gains a cost comment (#3): 3 × (canonical scan +
  N runtime scans) per scope on every projects GET. Acceptable at
  <30 scopes; cache when discovery growth pushes that ceiling.
- JS scope badge no longer carries a redundant ``data-i18n`` attribute
  (#5). The inline ``t()`` already renders text at construction; the
  attribute would let the i18n DOM walker re-translate and clobber.
- ``ctx-scope-summary`` now carries a ``title="${scope.root}"`` (#8) so
  same-name scopes (``Edu/inflearn`` vs ``Work/inflearn``) disambiguate
  on hover without bloating the visible label.

Cache-bust: ``context-gateway.js?v=4`` → ``?v=5``.

Tests updated: ``test_post_warns_on_missing_marker`` asserts the new
``warning_code``; ``test_post_no_warning_when_marker_present`` asserts
both fields are absent. 110/110 green; ruff + format clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
memtomem added a commit that referenced this pull request Apr 29, 2026
…#553)

* feat(web): multi-project read-only discovery for context-gateway tabs

PR2 of the multi-project context UI series — see
``memtomem-docs/memtomem/planning/multi-project-context-ui-rfc.md``.

A user running ``mm web`` from the ``memtomem`` repo could previously
only see skills / commands / agents under that one cwd; ``~/Edu/inflearn/
.claude/skills/`` was invisible without restarting the server. PR2 adds
discovery for additional project roots (server cwd, user-registered
``known_projects.json``, opt-in ``~/.claude/projects/`` reverse-decode)
and renders one collapsible ``<details>`` group per scope inside each
tab. Mutating routes stay cwd-only — multi-scope writes ship in PR3.

Backend
-------

- ``memtomem.context.projects`` is the new discovery module:
  - ``ProjectScope`` carries ``{scope_id, label, root, tier, sources,
    missing, experimental, counts}``.
  - ``compute_scope_id`` derives ``"p-" + sha256(case-normalized
    Path.resolve())[:12]`` so refresh / restart preserve the id (RFC
    §Decision 4). On macOS the input is also lowercased: APFS is
    case-insensitive but ``Path.resolve()`` does not canonicalize case.
  - ``KnownProjectsStore`` writes ``~/.memtomem/known_projects.json``
    via ``tempfile + os.replace`` plus a ``.<name>.lock`` sidecar
    fcntl lock — locking the data file directly does not survive
    ``os.replace`` (PR #548 issue).
  - ``discover_project_scopes`` unions cwd + known-projects + (opt-in)
    claude-projects, dedupes by resolved path, and clears
    ``experimental`` on union with a trusted source.
- ``ContextGatewayConfig`` (new) carries ``known_projects_path``,
  ``experimental_claude_projects_scan: bool = False``, and
  ``user_tier_enabled: bool = False`` (forward-compat for PR3).

Routes
------

New ``web/routes/context_projects.py``:

- ``GET /api/context/projects`` — full scope list with per-type counts.
- ``POST /api/context/known-projects`` — register; absolute + ``is_dir()``
  validation; warning (HTTP 200) when no ``.claude``/``.gemini``/``.agents``/
  ``.memtomem`` marker is present so users can pre-register an empty
  checkout.
- ``DELETE /api/context/known-projects/{scope_id}`` — drop, including
  stale entries (matching is path-derived).
- ``resolve_scope_root`` is the dependency the existing
  ``/api/context/{skills,commands,agents}`` GETs now use; without
  ``?scope_id=`` it falls back to the server cwd so PR1's mutating
  cwd flow keeps working unchanged. Unknown / stale scope_id → 404.

UI
--

Each tab gains an "Add Project" button; the list area renders one
``<details class="ctx-scope-group">`` per scope, lazy-fetching items
on expand. Server CWD opens by default; non-cwd scope items render as
read-only cards (PR2 keeps mutating buttons targeting cwd only). Each
non-cwd group has an ✕ button that calls DELETE
``/api/context/known-projects/{scope_id}``.

Three new i18n keys (placeholder-form, en + ko parity).

Tests
-----

PR2 minimum-bar from RFC §Test obligations:

- ``tests/test_context_projects.py`` — scope_id stability + collision
  sanity, trailing-slash invariance, macOS case-insensitivity,
  symlink dedup, ``experimental`` flag clears on union, both
  ``experimental_claude_projects_scan`` defaults, ``known_projects.json``
  corruption / unknown-version recovery, stale-entry removal,
  multiprocessing.Process atomic-write race.
- ``tests/test_web_routes_context_projects.py`` — HTTP shape, unknown
  scope_id 404, POST 400 paths (relative / nonexistent / file),
  marker warning, idempotent registration, DELETE round-trip + 404.

Manual smoke
------------

``HOME=/tmp/.../home uv run mm web --port 8088``: GET projects shows
cwd; POST /tmp returns 200 + warning; GET projects shows two scopes
(``Server CWD`` + ``tmp``); skills?scope_id=p-bogus → 404; DELETE
round-trip works; second DELETE → 404.

Refs RFC ``multi-project-context-ui``; gates PR3 (multi-project
mutating + Claude user-tier).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* review fixes: drop dead helpers, add warning_code, summary tooltip

Self-review of PR #553 surfaced five real issues — fixing them keeps
the surface honest about what's actually wired up.

- Drop dead helpers in ``context/projects.py`` (#1):
  - ``_is_eaccess`` was never called.
  - ``is_project_scope_id`` claimed to support route error formatting
    but no route used it.
  - ``_new_temp_known_projects_path`` claimed test convenience but no
    test imported it.
  - ``errno`` and ``tempfile`` imports follow.
- Drop ``ProjectScope.counts`` dataclass field (#2). Discovery never
  populated it; the route layer (``_scope_to_dict``) always overrides
  via ``_counts_for(root)``. Forward-design pollution.
- ``POST /api/context/known-projects`` now returns ``warning_code:
  "no_runtime_marker"`` alongside the prose ``warning`` (#4). Matches
  PR1's (#549) machine-readable ``reason_code`` pattern so client
  matching is i18n-stable; the prose stays for back-compat.
- ``_counts_for`` gains a cost comment (#3): 3 × (canonical scan +
  N runtime scans) per scope on every projects GET. Acceptable at
  <30 scopes; cache when discovery growth pushes that ceiling.
- JS scope badge no longer carries a redundant ``data-i18n`` attribute
  (#5). The inline ``t()`` already renders text at construction; the
  attribute would let the i18n DOM walker re-translate and clobber.
- ``ctx-scope-summary`` now carries a ``title="${scope.root}"`` (#8) so
  same-name scopes (``Edu/inflearn`` vs ``Work/inflearn``) disambiguate
  on hover without bloating the visible label.

Cache-bust: ``context-gateway.js?v=4`` → ``?v=5``.

Tests updated: ``test_post_warns_on_missing_marker`` asserts the new
``warning_code``; ``test_post_no_warning_when_marker_present`` asserts
both fields are absent. 110/110 green; ruff + format clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

---------

Co-authored-by: pandas-studio <[email protected]>
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
memtomem added a commit that referenced this pull request Apr 30, 2026
The upload dropzone in the Index tab spanned the full panel width (~750px
at typical card sizes) with only an icon and two short lines of content,
making it look disproportionately empty. Cap `.upload-drop` at
`max-width: 520px` with `margin: 0 auto` so the dropzone itself centers
inside the panel while sibling controls (file list, upload button, result
rows) keep their natural full-width layout.

Refs #582 (Prev #2).

Co-authored-by: pandas-studio <[email protected]>
Co-authored-by: Claude <[email protected]>
memtomem pushed a commit that referenced this pull request May 4, 2026
Codex review of #784 flagged that ``enforce_write_guard``'s audit log
renders ``audit_context`` values verbatim with ``repr()``. Several
callers pass user-controllable strings — ``file`` (mem_add /
web_api_add), ``filename`` (web_api_upload), ``key`` (scratch promote),
``chunk_id`` — and a bypass on a request whose path / filename / key
embeds the same secret as the body would surface that secret in
``server.log`` even though the matched bytes from ``content`` never do.
The "matched bytes must never reach logs" goal must hold for context
fields too, not just the body.

Add ``_sanitize_audit_value`` between caller and log emit:

- non-string values (None, int, bool) pass through untouched
- string values are re-scanned with ``DEFAULT_PATTERNS``; any hit
  replaces the whole value with the fixed marker
  ``<redacted: secret-shape>`` so operators see the field was scrubbed
  rather than missing
- clean strings longer than 200 chars are truncated to keep an abusive
  caller from ballooning the audit line

Two regression pins:

- ``test_secret_in_audit_context_value_is_redacted`` — embeds the same
  ``sk-…`` token in the body, in ``file=``, and in ``filename=``;
  asserts the matched bytes never reach the log message and the
  redaction marker takes the secret-shape value's place. Non-secret
  context fields (``namespace``, ``item_idx``) still pass through.
- ``test_long_audit_context_string_is_truncated`` — caps line length
  with the ``...(truncated)`` marker.

Codex Major #2 (SPA app.js / settings-harness.js cannot pass
``force_unsafe`` on the structured 403 path) is intentionally a
follow-up: backend currently fails closed (HTTP 403, no write) which is
the secure direction; the SPA confirm-and-retry UX is a separate
frontend change with its own scope.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
memtomem added a commit that referenced this pull request May 5, 2026
…en write surface (#784)

* sec(privacy): apply trust-boundary redaction guard to every user-driven write surface

The MCP `mem_add` path enforces a secret-class redaction guard before any
filesystem write, but the surrounding ingress paths (Web `POST /api/add`,
`POST /api/upload`, `PATCH /api/chunks/{id}`, scratch promotion, MCP
`mem_edit`, CLI `mm add` / shell `add` / agent share, LangGraph adapter)
were writing `req.content` / `new_content` / `entry["value"]` directly.
The same SQLite database could end up holding cleartext provider tokens
depending on which route the agent or user came in through.

Centralise the scan-decide-log shape in
`privacy.enforce_write_guard(content, surface=…, force_unsafe=False,
audit_context=…)` returning a `WriteGuardResult` (`pass` /
`blocked` / `bypassed`) and apply it at every gap surface. Each surface
records under its own `by_tool` key so attribution stays observable;
audit-log lines render `audit_context` as `key=value` pairs after
`surface=` and never echo matched bytes. Web mutating routes return
HTTP 403 with `{detail: "redaction_blocked", hits: N, surface: …}` so
the SPA can surface a confirm-and-retry UX before re-sending with
`force_unsafe=true`.

`mem_batch_add` keeps its inline scan + per-entry counter pattern —
the helper's single-content shape would obscure the transactional
"one hit rejects the whole batch" rule that test_memory_crud_redaction
already pins.

Out of scope (follow-up):
- `indexing/importers.py` and `indexing/url_fetcher.py` write external
  file/URL content rather than user-driven content; the trust model is
  different (operator-authorised indexing, not direct user submission)
  and merits its own decision pass.

Tests:
- `test_privacy.py::TestEnforceWriteGuard` — helper contract
  (pass / blocked / bypassed-with-audit / clean+force-unsafe still
  records pass / matched bytes never reach the log).
- `test_redaction_write_surfaces.py` — `mem_edit`, CLI `mm add`, and
  the LangGraph adapter all run the guard and route the counter to
  the right `by_tool` key.
- `test_web_routes.py` — new `TestAddMemoryRedaction`,
  `TestEditChunkRedaction`, `TestUploadRedaction`,
  `TestScratchPromoteRedaction` cover all four web mutating routes
  (block / bypass / clean-pass) end-to-end via ASGI client.

Existing 36-test redaction floor + 4048-test full suite stay green;
ruff check + format clean.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(privacy): redact secret-shaped audit-context values before logging

Codex review of #784 flagged that ``enforce_write_guard``'s audit log
renders ``audit_context`` values verbatim with ``repr()``. Several
callers pass user-controllable strings — ``file`` (mem_add /
web_api_add), ``filename`` (web_api_upload), ``key`` (scratch promote),
``chunk_id`` — and a bypass on a request whose path / filename / key
embeds the same secret as the body would surface that secret in
``server.log`` even though the matched bytes from ``content`` never do.
The "matched bytes must never reach logs" goal must hold for context
fields too, not just the body.

Add ``_sanitize_audit_value`` between caller and log emit:

- non-string values (None, int, bool) pass through untouched
- string values are re-scanned with ``DEFAULT_PATTERNS``; any hit
  replaces the whole value with the fixed marker
  ``<redacted: secret-shape>`` so operators see the field was scrubbed
  rather than missing
- clean strings longer than 200 chars are truncated to keep an abusive
  caller from ballooning the audit line

Two regression pins:

- ``test_secret_in_audit_context_value_is_redacted`` — embeds the same
  ``sk-…`` token in the body, in ``file=``, and in ``filename=``;
  asserts the matched bytes never reach the log message and the
  redaction marker takes the secret-shape value's place. Non-secret
  context fields (``namespace``, ``item_idx``) still pass through.
- ``test_long_audit_context_string_is_truncated`` — caps line length
  with the ``...(truncated)`` marker.

Codex Major #2 (SPA app.js / settings-harness.js cannot pass
``force_unsafe`` on the structured 403 path) is intentionally a
follow-up: backend currently fails closed (HTTP 403, no write) which is
the secure direction; the SPA confirm-and-retry UX is a separate
frontend change with its own scope.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(privacy): route mem_batch_add bypass audit through the shared sanitizer

Codex re-review of #784 showed the audit-context sanitizer landed
in cd27c99 only covered ``enforce_write_guard``'s single-content
path. ``mem_batch_add`` keeps its own transactional decision shape
and its own ``logger.warning`` line (``memory_crud.py:472``), so a
secret-shaped ``file=`` argument still leaked verbatim into the
warning log on a bypass — exactly the leak class cd27c99 was meant
to close.

Extract the audit emit into a public ``privacy.emit_bypass_audit``
seam:

- Same line shape as before
  (``redaction bypass via force_unsafe=True (surface=…, …,
  content_chars=…, hits=…)``).
- Each ``audit_context`` value runs through ``_sanitize_audit_value``
  before ``%r`` formatting, so a secret embedded in a user-controlled
  string field (``namespace`` / ``file`` / ``filename`` / ``key`` /
  ``chunk_id``) gets replaced with ``<redacted: secret-shape>``
  regardless of which entry point logged it.
- Counter (``record(...)``) is intentionally NOT bumped by the
  emitter; ``mem_batch_add`` still owns its per-item bookkeeping
  contract that ``test_memory_crud_redaction.py`` already pins.

``enforce_write_guard`` now delegates to ``emit_bypass_audit``, and
``mem_batch_add`` calls it directly with the same audit_context
shape (``namespace`` / ``file`` / ``item_idx``).

New regression
``TestMemBatchAddRedactionGuard.test_secret_shaped_file_argument_is_redacted_in_audit_log``
reproduces Codex's manual repro: feed a secret-matching value plus a
``file=…sk-…`` filename into a force_unsafe batch and assert the
warning line never contains the matched-bytes prefix while still
carrying the redaction marker.

Stale-comment Minor (``app.js:3730`` / ``:3763``) lands separately as
#786 on main; out of scope here.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* fix(privacy): emit real per-item hit count in mem_batch_add bypass audit

The shared sanitizer wired up in 18208bd carried ``hits=1`` for every
batch bypass, because the pre-scan loop only stored ``hit_indices``
without keeping the per-item ``len(privacy.scan(value))``. The single-
content path (``enforce_write_guard``) reports the real count, so
operators reading the audit log saw inconsistent ``hits=`` values
across surfaces (Codex review of PR #784, Minor #1).

Switch the pre-scan accumulator to ``hit_counts: dict[int, int]`` so
each entry that matches records its true hit count, then pass that
through ``emit_bypass_audit(..., hits=hit_counts[idx])``. The
whole-batch reject error message keeps the same shape — the indices
list is still rendered (now via ``sorted(hit_counts)``) so the user-
facing message and the existing
``test_full_reject_lists_hit_indices`` pin both still hold.

New regression
``TestMemBatchAddRedactionGuard.test_audit_hits_count_matches_scan_result``
feeds an entry that matches two patterns (``api_key:`` + ``sk-…``) and
asserts the audit line carries ``hits=<real count>``, not the
placeholder ``hits=1`` it used to.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: pandas-studio <[email protected]>
Co-authored-by: Claude <[email protected]>
memtomem pushed a commit that referenced this pull request May 5, 2026
Addresses ultrareview P2 follow-ups on PR-1:

P2 #1 — services.tag_management.rename_tag/delete_tag/merge_tags now
strip inputs and reject if empty after strip. Web routes passed
``body.new_name`` raw, while MCP pre-stripped — the asymmetry let
whitespace-only names persist via PUT /api/tags/{name}. The service is
the single source of truth, so the gate moves there; MCP's pre-strip
stays as harmless defense-in-depth.

P2 #2 — services.tag_management.merge_tags dry-run no longer caps
``affected_chunks`` at the per-tag scan limit. A new
``count_chunks_by_any_tag`` storage helper does a single
``COUNT(*) WHERE EXISTS (… IN (?, …))`` over the source-tag union, so a
tag attached to more than 10,000 chunks reports the true count instead
of an undercount in the confirmation modal. The sample loop still caps
at ``_DRY_RUN_SAMPLE_CAP`` since the UI only renders ten previews.

Co-Authored-By: Claude <[email protected]>
memtomem added a commit that referenced this pull request May 5, 2026
…795)

* feat(storage): scaffold tag-management helpers + service skeleton (#688 PR1 prep)

Lays the policy-independent groundwork for the tag rename/delete/merge
ops requested in #688. No callers wired yet — bodies fill in after the
confirm thread on the RFC resolves the updated_at policy and MCP-migration
scope.

- ``storage.list_chunks_by_tag`` + ``count_chunks_by_tag`` for the
  dry-run sample / count path the confirmation modal needs.
- ``storage.merge_tags`` as the third primitive next to the existing
  ``rename_tag`` / ``delete_tag``; dedupes the resulting per-chunk tag
  list and treats the target appearing in ``sources`` as a no-op source.
- ``SqliteBackend._tag_write_lock`` (``asyncio.Lock``) so the upcoming
  service layer and ``auto_tag_storage`` can serialize their
  read-modify-write of ``chunks.tags`` within a single process.
  Cross-process safety still falls back to SQLite's WAL file lock —
  this is an in-process invariant only.
- ``services/tag_management.py`` skeleton (``TagOpResult`` /
  ``TagOpSample`` + ``NotImplementedError`` stubs) so the Web (PR1),
  MCP (PR1 if migration is approved), and CLI (PR3) surfaces share a
  single entry point. Service exposes ``bump_updated_at`` as an
  explicit parameter so the policy choice lives at the route layer
  rather than buried in the service body.

22 storage tests pass (7 new); lint + format clean.

Co-Authored-By: Claude <[email protected]>

* fix(auto_tag): preserve full chunk metadata + simplify list helper (#688)

Two corrections to the prep commit (8507b54), driven by an audit
cross-check on the #688 RFC:

1. ``auto_tag_storage`` rebuilt ``ChunkMetadata`` from an explicit
   8-field list, silently dropping ``overlap_before/after``,
   ``parent_context``, ``file_context``, and the temporal-validity
   columns (``valid_from_unix`` / ``valid_to_unix``). Same shape bug
   ``web/routes/chunks.py:update_chunk_tags`` already fixed with a
   spread pattern; consolidate. Pinned by a regression test that
   fails on the pre-fix code (mutation-validated:
   ``valid_from_unix`` reads as ``None`` after the broken rewrite).

2. ``list_chunks_by_tag`` now thin-wraps
   ``recall_chunks(tag_filter=...)``. The latter already implements
   the ``json_each`` EXISTS match and ORDER BY shape this helper
   needs (``sqlite_backend.py:996``); the parallel SQL added in the
   prep commit was dead duplication.

22 tag/auto-tag tests pass; lint/format clean.

Co-Authored-By: Claude <[email protected]>

* feat(storage): bump updated_at on tag rename/delete/merge (#688)

Per the #688 confirm thread (option (a)): tag-only mutations now bump
``chunks.updated_at`` so downstream consumers see the rewrite as a
write event. ``decay.py`` reads ``updated_at`` for chunk age, which
means a rename or delete deliberately resets the decay timer for the
affected chunks. The trade-off is documented in each helper's
docstring; v1 treats tag fix-ups as "this chunk is being curated."

Symmetric pin tests cover both halves of the invariant: the mutated
chunk's ``updated_at`` moves forward AND an untouched chunk's
``updated_at`` stays anchored. Negative-only would false-pass against
a no-op implementation. Mutation-validated (3/3 fail on the unhardened
code).

Co-Authored-By: Claude <[email protected]>

* feat(services): implement tag-management service bodies (#688)

Replaces the prep skeleton's NotImplementedError stubs with the real
rename / delete / merge implementations. Each entry point:

- Acquires ``storage._tag_write_lock`` for the read-modify-write
  window (in-process serialization with ``auto_tag_storage``).
- On dry-run: returns ``count_chunks_by_tag`` plus a sample of up to
  10 chunk previews via ``list_chunks_by_tag``; storage stays
  untouched.
- On apply: delegates to the hardened storage helper (which now
  bumps ``updated_at``) and calls ``search_pipeline.invalidate_cache()``
  on a non-zero affected count, mirroring
  ``web/routes/system.py:418``.

The ``bump_updated_at`` parameter from the skeleton is dropped: the
storage helpers now bump ``updated_at`` unconditionally per the #688
confirm thread, so the parameter would have been dead.

Merge dry-run counts the union of candidate chunks across ``sources``
(excluding ``target``); this set equals what the storage helper will
actually mutate because every source-bearing chunk's tag list changes
when a source is rewritten to target. Target-only chunks are filtered
out of the source set up front, so they are not counted.

13 service tests pass — input validation, dry-run no-write, apply +
invalidate, no-match no-invalidate, merge union counting, lock
acquisition observable from outside.

Co-Authored-By: Claude <[email protected]>

* feat(auto_tag): serialize write window via _tag_write_lock + re-read (#688)

Closes the race window between ``auto_tag_storage`` and the new
tag-management service: previously an auto-tag run could spend N seconds
in keyword/LLM extraction, then upsert with the pre-extraction tag
snapshot — silently overwriting any rename / delete / merge that
landed during that window.

The lock is held narrowly. Slow extraction stays outside, only the
read-modify-write step (re-read latest chunk → spread metadata →
upsert) runs under ``storage._tag_write_lock``. The re-read is what
actually preserves the concurrent mutation: spreading on top of the
just-loaded ``ChunkMetadata`` carries the latest tag list (already
mutated by the service) so auto_tag layers ``new_tags`` on top
instead of clobbering it. Lock-narrow + invariant via mutation
ordering, per the pattern in ``feedback_lock_narrow_invariant_ordering``.

Lock-acquisition test holds the lock externally and verifies
auto_tag waits for it (mutation-validated: removing the lock
makes the test fail). Existing field-preservation regression
still passes against the re-read shape.

Co-Authored-By: Claude <[email protected]>

* feat(web): add /api/tags rename/delete/merge routes via shared service (#688)

Closes the Web-side asymmetry the audit on #688 surfaced: until now the
MCP surface had ``tag_management`` tools but the Web tab had no rename /
delete / merge UX surface, so any tag fix-up required either editing
chunks one by one or re-running ``auto_tag overwrite=true``.

- ``PUT  /api/tags/{name}`` — rename to ``body.new_name``
- ``DELETE /api/tags/{name}`` — drop the tag from every chunk
- ``POST /api/tags/merge`` — fold ``body.sources`` into ``body.target``

All three honour ``?dry_run=true`` and route through
``services.tag_management``, so they pick up the lock acquisition,
``updated_at`` bump, and ``search_pipeline.invalidate_cache()`` calls
without re-implementing them. ``ValueError`` from the service (empty
new_name / target) maps to a 400.

8 route tests cover apply + dry-run + 400 paths and assert that
``invalidate_cache`` fires on apply but not on dry-run or no-op
(``sources`` collapsing to just ``target``).

Co-Authored-By: Claude <[email protected]>

* feat(mcp): route tag management through shared service + add mem_tag_merge (#688)

Closes the MCP/Web asymmetry the audit on #688 surfaced. The MCP tools
now route through ``services.tag_management`` so both surfaces share
one lock, one ``updated_at`` policy, and one ``invalidate_cache``
contract. ``mem_tag_merge`` is added so MCP picks up the same merge
shape Web exposed in the previous commit.

- ``mem_tag_rename`` / ``mem_tag_delete`` / ``mem_tag_merge`` all gain
  ``dry_run: bool = False`` for the preview path. Defaults to False so
  existing MCP callers (no migration noise) keep applying as before.
- Same input validation as the Web routes (empty old/new/target/sources
  rejected with a clear error string the chat agent can show the user).
- Result text formatter renders the dry-run sample chunks below the
  "would affect N chunks" line so the user sees what the apply would
  touch.

Parity test discriminator: the cache invalidation. ``storage.rename_tag``
/ ``delete_tag`` / ``merge_tags`` don't know about ``SearchPipeline``, so
only the service-layer path can fire ``invalidate_cache``. The test
swaps ``invalidate_cache`` for a counter and asserts it fires exactly
once on apply, never on dry-run, and never on no-op
(``sources`` collapsing to just ``target``). Mutation-validated:
swapping ``mem_tag_rename`` body to call storage directly makes the
counter assertion fail with ``0 == 1``.

Co-Authored-By: Claude <[email protected]>

* test(guards): register new tag-management surfaces in AST invariants (#688)

The CSRF / redaction / namespace AST guards keep the security
classification registries honest by failing whenever a new mutating
surface lands without an explicit row in the registry. PR1 added
three Web routes (``tags.rename_tag``, ``tags.delete_tag``,
``tags.merge_tags``) and one MCP tool (``mem_tag_merge``); register
them so the suite stays green and the next contributor sees the right
classification template.

- ``_CSRF_PROTECTED``: all three routes (default for unsafe-method
  surfaces; the middleware already covers them).
- ``_REDACTION_EXEMPT``: all three routes — tags are short labels and
  these handlers rewrite existing chunk metadata only, so the LTM
  trust-boundary write guard does not apply (mirrors the existing
  ``chunks.update_chunk_tags`` and ``tags.run_auto_tag`` rows).
- ``DEFERRED_NS_SURFACES``: ``mem_tag_merge.target`` — the
  parameter-name heuristic catches ``target=`` but the value is a
  tag name, not a namespace, so no namespace validator applies.

Co-Authored-By: Claude <[email protected]>

* fix(services): strip tag inputs + accurate merge dry-run count (#688)

Addresses ultrareview P2 follow-ups on PR-1:

P2 #1 — services.tag_management.rename_tag/delete_tag/merge_tags now
strip inputs and reject if empty after strip. Web routes passed
``body.new_name`` raw, while MCP pre-stripped — the asymmetry let
whitespace-only names persist via PUT /api/tags/{name}. The service is
the single source of truth, so the gate moves there; MCP's pre-strip
stays as harmless defense-in-depth.

P2 #2 — services.tag_management.merge_tags dry-run no longer caps
``affected_chunks`` at the per-tag scan limit. A new
``count_chunks_by_any_tag`` storage helper does a single
``COUNT(*) WHERE EXISTS (… IN (?, …))`` over the source-tag union, so a
tag attached to more than 10,000 chunks reports the true count instead
of an undercount in the confirmation modal. The sample loop still caps
at ``_DRY_RUN_SAMPLE_CAP`` since the UI only renders ten previews.

Co-Authored-By: Claude <[email protected]>

* fix(services): absorb replace_chunk_tags + tighten rename same-name gate (#688)

Two follow-ups on the cross-surface symmetry pass:

- ``services.tag_management.replace_chunk_tags`` now backs
  ``PUT /api/chunks/{id}/tags``. The route used to ``upsert_chunks`` directly,
  which (a) skipped the ``_tag_write_lock`` and could race with an in-flight
  bulk rename/delete/merge, and (b) skipped the search TTL cache flush so a
  follow-up tag-filtered search could serve stale results. Routing through
  the service closes both gaps; the per-chunk path now obeys the same
  invariants as the global ones. ``Chunk`` no-op (deduped tags equal current)
  short-circuits without an upsert so ``updated_at`` and decay scoring are
  not disturbed.

- ``services.tag_management.rename_tag`` rejects ``old == new`` after strip.
  The MCP wrapper's raw ``==`` gate was both wrong (``"foo"`` vs ``" foo "``
  slipped through) and not honoured by Web — a same-name rename would still
  rewrite ``chunks.tags``, bumping ``updated_at`` and flushing the result
  cache as if data had changed. Service-layer is now the single gate; the
  MCP pre-check is removed.

Tests: 30 service tests (incl. 4 ``replace_chunk_tags`` + 4 preserve
invariants for content/embedding/hash/created_at), 1 MCP parity test for
the same-name reject path, 4 web-route tests for ``update_chunk_tags`` lock
+ cache invariants. Pin-mutation validated.

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: pandas-studio <[email protected]>
Co-authored-by: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant