refactor(stm): decouple surfacing via remote-only MCP by tsdata · Pull Request #2 · memtomem/memtomem

tsdata · 2026-04-09T10:27:37Z

Summary

STM의 LTM 접근 path를 두 가지(in-process import / mcp_client)에서 remote-only via MCP 프로토콜 단일 path로 통합
STM이 core 내부 심볼(SearchPipeline, SqliteBackend, extract_entities, create_components, Mem2MemConfig)에 직접 의존하던 부분을 모두 제거 — 이제 McpClientSearchAdapter만이 LTM 통신 통로
langchain + langgraph 패턴으로 향후 별도 저장소 분리 (memtomem-stm) 사전 작업

Why

STM은 본질적으로 MCP 프록시 게이트웨이입니다 (ProxyManager가 임의 개수 upstream MCP server를 받음 — brave-search/github/memtomem 등). 이전엔 memtomem만 in-process import로 특별 취급했는데, 이는 STM이 memtomem-server 역할까지 겸하는 architectural mismatch였습니다. remote-only로 단일화하면:

STM의 core 의존이 MCP 프로토콜 한 점으로 좁혀져 core 내부 리팩토링이 STM을 깨뜨리지 않음
memtomem 크래시 시에도 STM의 다른 upstream(brave-search 등)은 정상 동작 (fault isolation)
향후 git filter-repo로 별도 저장소 분리할 때 cross-repo dependency가 깔끔

Changes

Removed:

SurfacingEngine.search_pipeline / storage 생성자 인자 (이제 mcp_adapter keyword-only required)
SurfacingConfig.ltm_mode 필드 (remote가 유일한 모드)
server.py의 in-process bootstrap 분기 + try/except fallback 로직
_extract_heuristic의 memtomem.tools.entity_extraction import (빈 list stub)
TestFeedbackBoost 테스트 클래스 3개 (SqliteBackend.increment_access 직접 호출에 의존)
_boosted_surfacings 추적 set (access boost 분기와 함께)

Added:

tests/_fake_memtomem_server.py — 가벼운 stdio MCP fake server (32줄)
tests/test_remote_ltm_integration.py — end-to-end integration test, fake server에 child process로 연결해서 SurfacingEngine 동작 검증 (0.36s)

Updated:

README.md "LTM Connection Modes" → "LTM Connection" (단일 모드 설명)
6개 테스트 파일에서 _make_search_pipeline → _make_mcp_adapter rename, search_pipeline= → mcp_adapter= 일괄 전환
ruff --fix로 정리된 unused imports

Follow-ups (별도 PR)

이 PR은 in-process 전용 부수 기능 3개를 임시 비활성화합니다. 0.1.0 알파에서 사용자 거의 없으므로 안전하지만 후속 작업 필요:

Access boost on "helpful" feedback — core에 mem_increment_access mem_do action을 추가한 후, McpClientSearchAdapter.increment_access(ids) 메서드로 복원
Session context (working memory) injection — McpClientSearchAdapter.scratch_list() 메서드 추가 (mem_scratch_get(key=None) 호출 + 파싱)
Heuristic fact extraction fallback — STM 자체 정규식 휴리스틱(URL/식별자/명사구) 작성 (현재는 빈 list stub)

Test plan

STM 단독 테스트 731 passing (uv run pytest packages/memtomem-stm/tests/ -m \"not ollama\")
core 단독 테스트 857 passing (regression 0)
합쳐서 1588 passing
새 integration test 통과 (0.36s)
STM source ruff clean (uv run ruff check packages/memtomem-stm/src)
빈 venv에서 pip install memtomem-stm 후 memtomem-stm 기동 (Phase 4 PyPI 배포 시)

Phase roadmap

이 PR은 4단계 분리 작업의 Phase 1입니다:

Phase 1 (이 PR): STM 코드 리팩토링 — in-process 분기 제거, mcp_adapter 단일 path
Phase 2: core 잔여물 정리 — cli/stm_cmd.py, StmProxyConfig, web/routes/proxy.py, lifespan.py STM bootstrapping 제거
Phase 3: git filter-repo로 packages/memtomem-stm/ → 별도 저장소(https://github.com/memtomem/memtomem-stm.git) 추출, 모노레포에서 디렉터리 제거
Phase 4: 양쪽 PyPI 배포, lockstep release, compat range

🤖 Generated with Claude Code

STM previously supported two LTM access paths — in-process (direct SearchPipeline/SqliteBackend import) and mcp_client (stdio MCP). The dual path tightly coupled STM to core internals and conflicted with STM's role as an MCP proxy gateway, where memtomem is just another upstream alongside brave-search/github/etc. Collapse to a single remote-only path so STM depends only on the MCP protocol contract, not core internals. This sets up the eventual repo split (langgraph pattern: memtomem-stm becomes a separate package that spawns memtomem-server as a child stdio process). Removed: - SurfacingEngine.search_pipeline / storage constructor args - ltm_mode config field (remote is the only mode now) - in-process bootstrap branch in server.py - _extract_heuristic's import of memtomem.tools.entity_extraction - TestFeedbackBoost (used direct SqliteBackend.increment_access) Added: - _fake_memtomem_server.py + test_remote_ltm_integration.py for an end-to-end check that exercises stdio MCP without needing core installed Follow-ups (separate PRs): - Restore access boost via a new mem_increment_access MCP action - Restore include_session_context via McpClientSearchAdapter.scratch_list - Replace the heuristic stub with a native STM regex extractor Tests: 1588 passing (core 857, STM 731), 0 regressions, ruff clean. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Pre-existing main-branch CI breakage that surfaced when this PR ran the full ruff + pytest pipeline (the last 5 commits on main were all red): - memory_crud.py:113 had an unused `chunk = await ...` assignment — the index_entry return value isn't read, so just await it. - test_indexing_engine.py::TestIndexEntry's four tests call index_engine.index_entry, which triggers a real embedding HTTP call. They need the `ollama` marker so CI's `-m "not ollama"` filter skips them when Ollama isn't running. These two fixes unblock CI for both this PR and main. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

CI's `ruff format --check` was failing because eight existing core files weren't formatted (none of these were touched by this PR's refactor; all were pre-existing main-branch dust). Running `uv run ruff format packages/memtomem/src packages/memtomem-stm/src` reformats them in place. Pure formatting changes only. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

After Phase 1 (#2) made STM communicate with memtomem only over the MCP protocol, the in-process glue that lived inside memtomem core became dead code. Phase 2 removes it so the two packages can ship from separate repositories without leftover cross-references. Backend deletions: - packages/memtomem/src/memtomem/cli/stm_cmd.py — `mm stm init`/`reset` wizard that imported memtomem_stm. Belongs to the STM package now. - packages/memtomem/src/memtomem/web/routes/proxy.py — STM dashboard REST routes. Will be re-added inside the STM repo if needed. - packages/memtomem/src/memtomem/cli/__init__.py — drop the `stm` subcommand registration. - packages/memtomem/src/memtomem/config.py — drop StmProxyConfig and the `stm_proxy` field on Mem2MemConfig. - packages/memtomem/src/memtomem/server/context.py — drop the `stm_proxy_manager` slot on AppContext. - packages/memtomem/src/memtomem/server/lifespan.py — drop the entire `if config.stm_proxy.enabled` bootstrap branch (~80 lines) plus its cleanup section. memtomem core no longer imports memtomem_stm at all. - packages/memtomem/src/memtomem/web/app.py — drop `proxy.router` registration and the matching import. Frontend deletions: - packages/memtomem/src/memtomem/web/static/index.html — STM tab button and the entire STM panel (~70 lines). - packages/memtomem/src/memtomem/web/static/app.js — `_stmPollId` early declaration, the `if (tabName === 'stm')` switch arm, the STM PROXY DASHBOARD section (~320 lines), and the `stm-filter`/`stm-hist-prev`/ `stm-hist-next` data-action handlers from the global delegator. Test updates: - test_cli.py: drop `stm` from registered subcommand list and remove `test_stm_help`. - test_context_window.py: remove `TestSTMFormatter` (it imported memtomem_stm.surfacing.formatter — a cross-package boundary violation that should never have been in core tests). - test_indexing_engine.py: ruff --fix removed unused imports surfaced by the indexing test changes. - assorted other test files: ruff --fix cleaned unused imports. Verification: - 850 core tests pass (`uv run pytest packages/memtomem/tests -m "not ollama"`) - ruff check + format clean for both src trees - final grep across packages/memtomem confirms zero references to stm_proxy / StmProxy / stm_cmd / memtomem_stm in source or tests Follow-up (not blocking): style.css still contains `.stm-*` rules; they are unused but harmless. Drop them in a small later cleanup PR. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

After Phase 1 (#2) made STM communicate with memtomem only over the MCP protocol, the in-process glue that lived inside memtomem core became dead code. Phase 2 removes it so the two packages can ship from separate repositories without leftover cross-references. Backend deletions: - packages/memtomem/src/memtomem/cli/stm_cmd.py — `mm stm init`/`reset` wizard that imported memtomem_stm. Belongs to the STM package now. - packages/memtomem/src/memtomem/web/routes/proxy.py — STM dashboard REST routes. Will be re-added inside the STM repo if needed. - packages/memtomem/src/memtomem/cli/__init__.py — drop the `stm` subcommand registration. - packages/memtomem/src/memtomem/config.py — drop StmProxyConfig and the `stm_proxy` field on Mem2MemConfig. - packages/memtomem/src/memtomem/server/context.py — drop the `stm_proxy_manager` slot on AppContext. - packages/memtomem/src/memtomem/server/lifespan.py — drop the entire `if config.stm_proxy.enabled` bootstrap branch (~80 lines) plus its cleanup section. memtomem core no longer imports memtomem_stm at all. - packages/memtomem/src/memtomem/web/app.py — drop `proxy.router` registration and the matching import. Frontend deletions: - packages/memtomem/src/memtomem/web/static/index.html — STM tab button and the entire STM panel (~70 lines). - packages/memtomem/src/memtomem/web/static/app.js — `_stmPollId` early declaration, the `if (tabName === 'stm')` switch arm, the STM PROXY DASHBOARD section (~320 lines), and the `stm-filter`/`stm-hist-prev`/ `stm-hist-next` data-action handlers from the global delegator. Test updates: - test_cli.py: drop `stm` from registered subcommand list and remove `test_stm_help`. - test_context_window.py: remove `TestSTMFormatter` (it imported memtomem_stm.surfacing.formatter — a cross-package boundary violation that should never have been in core tests). - test_indexing_engine.py: ruff --fix removed unused imports surfaced by the indexing test changes. - assorted other test files: ruff --fix cleaned unused imports. Verification: - 850 core tests pass (`uv run pytest packages/memtomem/tests -m "not ollama"`) - ruff check + format clean for both src trees - final grep across packages/memtomem confirms zero references to stm_proxy / StmProxy / stm_cmd / memtomem_stm in source or tests Follow-up (not blocking): style.css still contains `.stm-*` rules; they are unused but harmless. Drop them in a small later cleanup PR. Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>

Reflect the Gemini-regenerated security curation (commit 5fbb47f) in the handoff doc so a fresh session knows which Phase 2c steps remain. Updates the phase-progress table, adds a "Security Phase 3a measurements" subsection (21.9% drift -> H1 supported; intra-vocab reclassifications to auth/mtls, auth/rbac, networking/tls; incident subtopic skew), marks next-actions #2-#3 as complete, and points #4 (IDF pre-measure) at the authoritative Gemini batch directory. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

11 chunks affected, 13 events / 32 = 40.6% event-count — **Upper- outlier band (32-45%)** realized per pre-registration, above observability's 28.1% by 12.5 pp. Pre-registered cell; not a novel failure mode. Two systematic Gemini patterns drove 6/13 events (47%): - Pattern 1: `kubectl logs` diagnostic conflated with `observability/logging` subtopic (B3 #2, B5 #1, B6 #2) - Pattern 2: postmortem genre conflated with `incident_response/postmortem` subtopic (B7 #1, B7 #2, B8 #2) Both are Phase 3b drift-validator "forbidden pair" candidates (trigger condition k >= 5 topics met at k8s). Category mix: absent-topic 11, missed secondary 2 (paired with cross-topic drops on B1 #4 cost_opt→k8s/scaling and B8 #4 observability/metrics→k8s/scheduling), out-of-vocab 0, intra-vocab 0, over-correction 0. Band-realization sensitivity check: chunk-count drift would be 11/32 = 34.4%, still Upper-outlier. Event-count convention does not flip the band. Baseline observation (n=2 event-count at observability + k8s): drifts span 28.1% to 40.6%. "Prompt-structure drift as constant ~25-30%" hypothesis weakened — topic-level variance appears real. H1/H2/H3 formal reformulation still deferred to kafka per (A)- path; no promotion to working hypothesis at n=2. Post-k8s decision pending divergence (Step 6); handoff Step 3 ✅. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Trigger condition ("≥ 5 topics ledgered") met at k8s — implement the three-tier drift validator against the accumulated 6-topic curation ledger (caching baseline + postgres + cost_opt + security + observability + k8s). Two tiers locked; greenlist deferred until a false-positive case appears. **Forbidden** (auto-reject, CLI exit 1): - Closed-vocab enforcement: unknown topic, unknown subtopic, malformed tag. Guards against silent closed-vocab drift after the 2026-04-17 freeze. - `genre-postmortem-vs-ir-postmortem-subtopic`: postmortem-genre chunk with non-IR primary cannot tag `incident_response/postmortem` as secondary. k8s Pattern 2 (3 events: B7 #1, B7 #2, B8 #2). **Manual-review** (warn, no block — curator verifies by body context): - `kubectl-logs-diagnostic-vs-observability-logging`: k8s/* primary + obs/logging secondary + body uses `kubectl logs`. k8s Pattern 1 (3 events: B3 #2, B5 #1, B6 #2). - `security-access-control-primary-with-rbac-body`: security/access_control primary + body mentions RBAC-specific resources (Role, RoleBinding, ClusterRoleBinding). Security ledger trouble KO #1, postmortem KO #2. - `security-encryption-primary-with-transport-body`: security/encryption primary + transport-layer body (TLS, mTLS, cert-manager, certbot, PeerAuthentication) AND secondary does not already name networking/tls or auth/mtls. Security ledger adr EN #1, runbook KO #2, trouble KO #2. Suppression clause matches "Borderline cases preserved" precedent (postmortem EN #3, runbook EN #1) where the curator already captured the split functionally. Deliverables: - tools/retrieval-eval/drift_validator.py (~280 lines): parser, rules, CLI. `uv run python tools/retrieval-eval/drift_validator.py <corpus_root|fixture.md>`. Corpus walker filters to genre fixtures only (skips README and other .md). - packages/memtomem/tests/test_drift_validator.py: 23 tests — parser, each rule positive + negative, CLI exit codes, corpus-sanity assertions (zero forbidden, zero manual-review on current corpus). - tools/retrieval-eval/README.md: tool entry added. - docs/planning/b2-v2-design.md § "Drift validator": TODO converted to implemented, inline rule summary with ledger citations. - docs/planning/b2-v2-phase2b-ledger.md § "Deferred decisions": checkbox flipped, full rule list recorded. Validation against current corpus: zero forbidden, zero manual-review. `ruff check` + `ruff format --check` clean, `mypy` advisory clean, `pytest packages/memtomem/tests/test_drift_validator.py`: 23 passed. CI wiring to run this against corpus_v2 on every PR lands at Phase 6. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…on (#464) * feat(mem_add,mem_edit): canonical writer + mem_edit header preservation Stacks on PR #463 (chunker reader). Two coupled writer-side changes: 1. append_entry emits canonical blockquote tags > tags: ["a", "b"] # explicit `> ` prefix, JSON-quoted instead of the legacy lazy-continuation form tags: ['a', 'b'] # Python repr(), no `> ` prefix Lazy continuation is well-defined CommonMark but easy to break (a single blank line ends the blockquote), and Python repr() is not valid YAML. The chunker's section-leading parser already accepts both shapes, so existing files still parse — the writer just stops producing the fragile form on new entries. New test_memory_writer_tag_format.py pins the canonical output so a future regression that reverts to lazy / Python-repr fails fast. 2. mem_edit preserves the per-entry header Pre-PR, mem_edit called replace_lines(start_line, end_line, new_content), which replaced the entire chunk range — heading, blockquote header, body — with whatever the caller supplied. That was already wide pre-RFC, but post-PR-A the chunker strips the blockquote from chunk content, so a caller working from chunk.content would supply body-only and silently erase the metadata header on disk. New replace_chunk_body helper detects the heading + section- leading blockquote group at the top of the chunk's line range and keeps them intact while replacing only the body. If new_content itself starts with `## `, the call reverts to a full replacement (preserving the pre-RFC mem_edit semantic for users who want to override the heading explicitly). New test_mem_edit_header_preservation.py covers six cases (canonical, legacy lazy, no header, full-replace via `## `, non-first sub-chunk of an oversized section, trailing-newline round-trip). Round-trip and follow-ups: - test_multi_agent_integration test_share_copies_chunk_with_audit_tag now also asserts the forward direction: mem_search(tag_filter="shared-from=<src>") returns the share copy and only the share copy. Closes the e2e gap discovered during PR #462 that drove RFC mem-add-tags-blockquote-promote-rfc.md. - test_chunking_blockquote_tags adds a block-list-shape case (`> tags:\n> - a\n> - b`). The current writer never emits this shape, but the shared _parse_tags_value helper supports it and this locks the support in. Implements RFC §Migration & tests #2 + #3 and the writer-PR scope addition (mem_edit header preservation, ③-1) noted in PR #463 review. Co-Authored-By: Claude <[email protected]> * fix(web): preserve per-entry header in PATCH /api/chunks edit endpoint Reviewer caught that web/routes/chunks.py:edit_chunk still called replace_lines directly, which bypasses the header-preservation logic mem_edit gained in this PR. Same footgun, different surface — the Web UI editor surfaces chunk.content (already header-stripped by the chunker), so a Save from the browser would feed body-only content into replace_lines and silently erase the heading + > created: + > tags: on disk. Swap the call to replace_chunk_body so mem_edit and the Web UI route share the same preservation contract. Add a regression test (test_edit_chunk_preserves_blockquote_header) that drives the route end-to-end with a real tmp-file source and asserts the metadata header survives a body-only PATCH. Also folded in two reviewer nits: - mem_edit MCP-tool docstring now mentions the body-only / `## ` bypass policy (was only documented on the underlying helper). - replace_chunk_body has a one-line comment noting that ``## `` is the full-replace trigger because append_entry always emits H2; other heading levels in user input are body content. grep across src/ confirms replace_lines now has zero non-test callers in the codebase. Whether to deprecate it or keep it as a low-level escape hatch is tracked as a follow-up; not closing that question here. Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: pandas-studio <[email protected]> Co-authored-by: Claude <[email protected]>

…ote header When chunk_file strips a section-leading ``> created:`` / ``> tags:`` blockquote before passing text to _split_section, base_line still points at the heading. line_offset in _split_section started at 0 and counted only body lines, so sub-chunks 2..N reported start_line 3-5 lines earlier than the body they actually covered. Practical impact: mem_edit on a non-first sub-chunk reads file lines [start_line..end_line] via replace_chunk_body. With drift, those lines included real body content from the previous sub-chunk — which _find_body_start_index does not detect as a header (no heading, no leading blockquote at the start of the range), so preserved == [] and the replacement silently dropped that content on save. Fix: _extract_section_blockquote_tags returns a third value, lines_consumed, counting how many input lines the strip consumed. chunk_file adds it to leading_strip_lines (lines stripped by .strip() on the raw section text) and passes the total + 1 (for the heading line itself) as a body_offset kwarg to _split_section, which seeds line_offset with body_offset. Sub-chunks 1..N now align with file lines as well as the pre-strip code did. Sub-chunk 1's start_line still anchors at the heading (line_offset is only consulted on the first emit boundary, not at init), so mem_edit's header-preservation contract for sub-chunk 1 is preserved. Single-chunk sections and sections without any blockquote header are unchanged: body_offset stays 0 in those paths. Two new tests in test_chunking_blockquote_tags.py: - test_blockquote_header_does_not_drag_sub_chunk_start_lines: a controlled oversize section with an explicit blockquote header, sub-chunks 2..N must report start_line >= body-start file line. Fails pre-fix. - test_oversize_section_without_blockquote_unchanged: pre-PR-A path is preserved when there's no header to strip. Closes RFC ``planning/mem-add-tags-blockquote-promote-rfc.md`` §Follow-ups #2. Co-Authored-By: Claude <[email protected]>

…ote header (#466) * fix(chunking): seed _split_section line counter past stripped blockquote header When chunk_file strips a section-leading ``> created:`` / ``> tags:`` blockquote before passing text to _split_section, base_line still points at the heading. line_offset in _split_section started at 0 and counted only body lines, so sub-chunks 2..N reported start_line 3-5 lines earlier than the body they actually covered. Practical impact: mem_edit on a non-first sub-chunk reads file lines [start_line..end_line] via replace_chunk_body. With drift, those lines included real body content from the previous sub-chunk — which _find_body_start_index does not detect as a header (no heading, no leading blockquote at the start of the range), so preserved == [] and the replacement silently dropped that content on save. Fix: _extract_section_blockquote_tags returns a third value, lines_consumed, counting how many input lines the strip consumed. chunk_file adds it to leading_strip_lines (lines stripped by .strip() on the raw section text) and passes the total + 1 (for the heading line itself) as a body_offset kwarg to _split_section, which seeds line_offset with body_offset. Sub-chunks 1..N now align with file lines as well as the pre-strip code did. Sub-chunk 1's start_line still anchors at the heading (line_offset is only consulted on the first emit boundary, not at init), so mem_edit's header-preservation contract for sub-chunk 1 is preserved. Single-chunk sections and sections without any blockquote header are unchanged: body_offset stays 0 in those paths. Two new tests in test_chunking_blockquote_tags.py: - test_blockquote_header_does_not_drag_sub_chunk_start_lines: a controlled oversize section with an explicit blockquote header, sub-chunks 2..N must report start_line >= body-start file line. Fails pre-fix. - test_oversize_section_without_blockquote_unchanged: pre-PR-A path is preserved when there's no header to strip. Closes RFC ``planning/mem-add-tags-blockquote-promote-rfc.md`` §Follow-ups #2. Co-Authored-By: Claude <[email protected]> * test: add no-tags blockquote oversize regression for line-drift fix Reviewer N3: gap in coverage — `_extract_section_blockquote_tags` short-circuits to ([], text, 0) when no `tags:` key is found, so the blockquote stays in content and `blockquote_strip_lines` is 0. The existing tests covered (a) tags present + strip and (b) no blockquote at all, but not (c) blockquote present without tags. The new case verifies for an oversize section with `> created:` only: - sub-chunks inherit no tags (no promotion happened) - first sub-chunk's content still carries `> created:` (mem_edit can preserve it on body-only edits) - sub-chunks 2..N start_line still aligns with the body — no drift from the leading_strip path even when the blockquote isn't consumed by the strip helper Also tracks reviewer N1 (paragraph-gap +2 imprecision in _split_section) in the RFC follow-ups: separate from this PR; cosmetic end_line overshoot, not data loss. Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: pandas-studio <[email protected]> Co-authored-by: Claude <[email protected]>

… in toast, align canonical_root style Follow-up to PR #549 review: - **#1 (low-medium)**: drop the hardcoded `_CTX_EMPTY_HINT_META` JS map. GET `/api/context/{skills,commands,agents}` now also returns `canonical_root: str` and `scanned_dirs: list[str]` (computed once at module import from `SKILL_DIRS` / `AGENT_DIRS` / `COMMAND_DIRS`), so the empty-state hint pulls from the wire instead of duplicating the detector layout client-side. The JS still has a one-line fallback (`.memtomem/${type}` + empty list) for older backends, but it never fires when the cache-bust takes effect. - **#2 (cosmetic)**: skills sync response now uses the `CANONICAL_SKILL_ROOT = ".memtomem/skills"` constant directly, matching the `CANONICAL_AGENT_ROOT` / `CANONICAL_COMMAND_ROOT` pattern in agents/commands routes. Drops the `_safe_rel(canonical_skills_root( project_root), project_root)` round-trip — both forms resolve to the same string today, but a constant won't drift if a future PR adds env / scope resolution to `canonical_skills_root()`. - **#3 (low)**: import-no-runtimes toast renders `basename(project_root)` instead of the absolute path. The wire still carries the absolute string (consistent with `/api/system/*` and useful for debugging / reverse-proxy contexts) but the user-facing toast keeps it short — `scanned_dirs` already gives full orientation. New `_ctxBasename()` helper handles the POSIX path split. Cache-bust bumped to `context-gateway.js?v=3`. Tests: - New assertions on `TestListSkills.test_empty`, `TestListCommands.test_empty`, `TestListAgents.test_empty` verify GET response now includes `canonical_root` + `scanned_dirs`. - `pytest -m "not ollama"` 3152 passed; `ruff check` + `ruff format --check` clean. Skipped per reviewer guidance: - #4 `Final[str]` → `Literal` / `StrEnum` (mypy is advisory). - #5 additional positive route assertions for `invalid_name` / `already_imported` / `toml_parse_error` (route layer is a trivial dict comprehension; covered at the core layer). - #6 Korean `<이름>` literal angle-bracket (intentional placeholder guide, not an unsubstituted variable). Co-Authored-By: Claude <[email protected]>

* fix(web): surface "why nothing happened" on Skills/Commands/Agents Sync+Import Users reported "Sync 도 Import 도 동작이 이상함" — both buttons in mm web → Settings → Skills/Commands/Agents looked successful (a green toast popped) but nothing actually moved on disk. The root cause was a shape mismatch in the response handling, not a wiring bug: - For skills, the sync response carries `skipped` (e.g. "no canonical skills") but never `dropped`. The client only read `dropped`, so the skipped reason was silently dropped on the floor. - The import response carried no metadata at all, so a 0-imported, 0-skipped result (the common case when no `.claude/skills/` exists in the project) just rendered "Import completed" with no clue that the scanner found nothing because the directories didn't exist. This change keeps cwd-bound single-project behavior unchanged and adds machine-readable plumbing the UI can match against: - All three context types' `Sync` and `Import` core layers (`memtomem.context.{skills,commands,agents}`) now record skipped items as `(name, reason, reason_code)` triples. Reason codes come from a new `memtomem.context._skip_reasons` module: `no_canonical_root`, `unknown_runtime`, `parse_error`, `invalid_name`, `already_imported`, `canonical_exists`, `toml_parse_error`. Human `reason` strings stay for CLI/log output. - Web route handlers (`web/routes/context_{skills,commands,agents}.py`) surface `reason_code` per skipped item, plus `canonical_root` on sync responses and `project_root` + `scanned_dirs` on import responses. - The web client (`context-gateway.js`) reads `data.skipped` (in addition to `data.dropped` for commands/agents field-level drops), matches on `reason_code === "no_canonical_root"` to show "No canonical {type} under {canonical}. Create one first." instead of a generic success, and on a 0/0 import shows "No runtime {type} found in {project_root}. Scanned: {scan_dirs}." so users see exactly which paths the detector inspected. - The empty list state hint now points at the canonical and runtime paths instead of "Create one or import from existing runtimes." — a fresh user can drop a `SKILL.md` into the right directory without guessing. - Three new placeholder-form i18n keys (`empty_hint`, `import_no_runtimes`, `sync_empty_canonical`) cover all three types via `{type}` / `{canonical}` / `{root}` / `{scan_dirs}` substitution rather than 9 type-specific keys. - `context-gateway.js?v=2` cache-bust. All response changes are additive — existing clients that only read `imported`/`generated`/`skipped[].name|runtime|reason` keep working unchanged. The CLI (`mm context skills/agents/commands ...`) and the MCP `context` tool also unpack the new triple shape via `_code` discard so their human-readable output is byte-identical. Tests: - 3 core test suites (`test_context_{skills,commands,agents}.py`) updated to the 3-tuple shape. - New web route assertions verify `reason_code`, `canonical_root`, `project_root`, and `scanned_dirs` are present in sync/import responses for the empty-canonical and empty-runtime paths. - Full suite green: `pytest -m "not ollama"` 3152 passed; `ruff check` + `ruff format --check` clean. Co-Authored-By: Claude <[email protected]> * fix(web/context): address review — surface scan dirs on GET, basename in toast, align canonical_root style Follow-up to PR #549 review: - **#1 (low-medium)**: drop the hardcoded `_CTX_EMPTY_HINT_META` JS map. GET `/api/context/{skills,commands,agents}` now also returns `canonical_root: str` and `scanned_dirs: list[str]` (computed once at module import from `SKILL_DIRS` / `AGENT_DIRS` / `COMMAND_DIRS`), so the empty-state hint pulls from the wire instead of duplicating the detector layout client-side. The JS still has a one-line fallback (`.memtomem/${type}` + empty list) for older backends, but it never fires when the cache-bust takes effect. - **#2 (cosmetic)**: skills sync response now uses the `CANONICAL_SKILL_ROOT = ".memtomem/skills"` constant directly, matching the `CANONICAL_AGENT_ROOT` / `CANONICAL_COMMAND_ROOT` pattern in agents/commands routes. Drops the `_safe_rel(canonical_skills_root( project_root), project_root)` round-trip — both forms resolve to the same string today, but a constant won't drift if a future PR adds env / scope resolution to `canonical_skills_root()`. - **#3 (low)**: import-no-runtimes toast renders `basename(project_root)` instead of the absolute path. The wire still carries the absolute string (consistent with `/api/system/*` and useful for debugging / reverse-proxy contexts) but the user-facing toast keeps it short — `scanned_dirs` already gives full orientation. New `_ctxBasename()` helper handles the POSIX path split. Cache-bust bumped to `context-gateway.js?v=3`. Tests: - New assertions on `TestListSkills.test_empty`, `TestListCommands.test_empty`, `TestListAgents.test_empty` verify GET response now includes `canonical_root` + `scanned_dirs`. - `pytest -m "not ollama"` 3152 passed; `ruff check` + `ruff format --check` clean. Skipped per reviewer guidance: - #4 `Final[str]` → `Literal` / `StrEnum` (mypy is advisory). - #5 additional positive route assertions for `invalid_name` / `already_imported` / `toml_parse_error` (route layer is a trivial dict comprehension; covered at the core layer). - #6 Korean `<이름>` literal angle-bracket (intentional placeholder guide, not an unsubstituted variable). Co-Authored-By: Claude <[email protected]> * refactor(context): narrow reason_code to Literal SkipCode Tightens the typing on the third element of `(name, reason, reason_code)` triples produced by `SkillSyncResult.skipped`, `CommandSyncResult.skipped`, `AgentSyncResult.skipped`, and `ExtractResult.skipped`. The new `memtomem.context._skip_reasons.SkipCode` is a `Literal` of the seven codes already enumerated in that module, so a typo at the construction site fails type-check instead of slipping through to the wire. Also strengthens the import-empty web-route tests for skills, commands, and agents to assert `data["project_root"] == str(tmp_path)` instead of just key presence — confirms the response actually echoes the fixture's project root rather than some unrelated path. No runtime behavior change; addresses follow-up polish flagged in the review of this PR. Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: pandas-studio <[email protected]> Co-authored-by: Claude <[email protected]>

Self-review of PR #553 surfaced five real issues — fixing them keeps the surface honest about what's actually wired up. - Drop dead helpers in ``context/projects.py`` (#1): - ``_is_eaccess`` was never called. - ``is_project_scope_id`` claimed to support route error formatting but no route used it. - ``_new_temp_known_projects_path`` claimed test convenience but no test imported it. - ``errno`` and ``tempfile`` imports follow. - Drop ``ProjectScope.counts`` dataclass field (#2). Discovery never populated it; the route layer (``_scope_to_dict``) always overrides via ``_counts_for(root)``. Forward-design pollution. - ``POST /api/context/known-projects`` now returns ``warning_code: "no_runtime_marker"`` alongside the prose ``warning`` (#4). Matches PR1's (#549) machine-readable ``reason_code`` pattern so client matching is i18n-stable; the prose stays for back-compat. - ``_counts_for`` gains a cost comment (#3): 3 × (canonical scan + N runtime scans) per scope on every projects GET. Acceptable at <30 scopes; cache when discovery growth pushes that ceiling. - JS scope badge no longer carries a redundant ``data-i18n`` attribute (#5). The inline ``t()`` already renders text at construction; the attribute would let the i18n DOM walker re-translate and clobber. - ``ctx-scope-summary`` now carries a ``title="${scope.root}"`` (#8) so same-name scopes (``Edu/inflearn`` vs ``Work/inflearn``) disambiguate on hover without bloating the visible label. Cache-bust: ``context-gateway.js?v=4`` → ``?v=5``. Tests updated: ``test_post_warns_on_missing_marker`` asserts the new ``warning_code``; ``test_post_no_warning_when_marker_present`` asserts both fields are absent. 110/110 green; ruff + format clean. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…#553) * feat(web): multi-project read-only discovery for context-gateway tabs PR2 of the multi-project context UI series — see ``memtomem-docs/memtomem/planning/multi-project-context-ui-rfc.md``. A user running ``mm web`` from the ``memtomem`` repo could previously only see skills / commands / agents under that one cwd; ``~/Edu/inflearn/ .claude/skills/`` was invisible without restarting the server. PR2 adds discovery for additional project roots (server cwd, user-registered ``known_projects.json``, opt-in ``~/.claude/projects/`` reverse-decode) and renders one collapsible ``<details>`` group per scope inside each tab. Mutating routes stay cwd-only — multi-scope writes ship in PR3. Backend ------- - ``memtomem.context.projects`` is the new discovery module: - ``ProjectScope`` carries ``{scope_id, label, root, tier, sources, missing, experimental, counts}``. - ``compute_scope_id`` derives ``"p-" + sha256(case-normalized Path.resolve())[:12]`` so refresh / restart preserve the id (RFC §Decision 4). On macOS the input is also lowercased: APFS is case-insensitive but ``Path.resolve()`` does not canonicalize case. - ``KnownProjectsStore`` writes ``~/.memtomem/known_projects.json`` via ``tempfile + os.replace`` plus a ``.<name>.lock`` sidecar fcntl lock — locking the data file directly does not survive ``os.replace`` (PR #548 issue). - ``discover_project_scopes`` unions cwd + known-projects + (opt-in) claude-projects, dedupes by resolved path, and clears ``experimental`` on union with a trusted source. - ``ContextGatewayConfig`` (new) carries ``known_projects_path``, ``experimental_claude_projects_scan: bool = False``, and ``user_tier_enabled: bool = False`` (forward-compat for PR3). Routes ------ New ``web/routes/context_projects.py``: - ``GET /api/context/projects`` — full scope list with per-type counts. - ``POST /api/context/known-projects`` — register; absolute + ``is_dir()`` validation; warning (HTTP 200) when no ``.claude``/``.gemini``/``.agents``/ ``.memtomem`` marker is present so users can pre-register an empty checkout. - ``DELETE /api/context/known-projects/{scope_id}`` — drop, including stale entries (matching is path-derived). - ``resolve_scope_root`` is the dependency the existing ``/api/context/{skills,commands,agents}`` GETs now use; without ``?scope_id=`` it falls back to the server cwd so PR1's mutating cwd flow keeps working unchanged. Unknown / stale scope_id → 404. UI -- Each tab gains an "Add Project" button; the list area renders one ``<details class="ctx-scope-group">`` per scope, lazy-fetching items on expand. Server CWD opens by default; non-cwd scope items render as read-only cards (PR2 keeps mutating buttons targeting cwd only). Each non-cwd group has an ✕ button that calls DELETE ``/api/context/known-projects/{scope_id}``. Three new i18n keys (placeholder-form, en + ko parity). Tests ----- PR2 minimum-bar from RFC §Test obligations: - ``tests/test_context_projects.py`` — scope_id stability + collision sanity, trailing-slash invariance, macOS case-insensitivity, symlink dedup, ``experimental`` flag clears on union, both ``experimental_claude_projects_scan`` defaults, ``known_projects.json`` corruption / unknown-version recovery, stale-entry removal, multiprocessing.Process atomic-write race. - ``tests/test_web_routes_context_projects.py`` — HTTP shape, unknown scope_id 404, POST 400 paths (relative / nonexistent / file), marker warning, idempotent registration, DELETE round-trip + 404. Manual smoke ------------ ``HOME=/tmp/.../home uv run mm web --port 8088``: GET projects shows cwd; POST /tmp returns 200 + warning; GET projects shows two scopes (``Server CWD`` + ``tmp``); skills?scope_id=p-bogus → 404; DELETE round-trip works; second DELETE → 404. Refs RFC ``multi-project-context-ui``; gates PR3 (multi-project mutating + Claude user-tier). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * review fixes: drop dead helpers, add warning_code, summary tooltip Self-review of PR #553 surfaced five real issues — fixing them keeps the surface honest about what's actually wired up. - Drop dead helpers in ``context/projects.py`` (#1): - ``_is_eaccess`` was never called. - ``is_project_scope_id`` claimed to support route error formatting but no route used it. - ``_new_temp_known_projects_path`` claimed test convenience but no test imported it. - ``errno`` and ``tempfile`` imports follow. - Drop ``ProjectScope.counts`` dataclass field (#2). Discovery never populated it; the route layer (``_scope_to_dict``) always overrides via ``_counts_for(root)``. Forward-design pollution. - ``POST /api/context/known-projects`` now returns ``warning_code: "no_runtime_marker"`` alongside the prose ``warning`` (#4). Matches PR1's (#549) machine-readable ``reason_code`` pattern so client matching is i18n-stable; the prose stays for back-compat. - ``_counts_for`` gains a cost comment (#3): 3 × (canonical scan + N runtime scans) per scope on every projects GET. Acceptable at <30 scopes; cache when discovery growth pushes that ceiling. - JS scope badge no longer carries a redundant ``data-i18n`` attribute (#5). The inline ``t()`` already renders text at construction; the attribute would let the i18n DOM walker re-translate and clobber. - ``ctx-scope-summary`` now carries a ``title="${scope.root}"`` (#8) so same-name scopes (``Edu/inflearn`` vs ``Work/inflearn``) disambiguate on hover without bloating the visible label. Cache-bust: ``context-gateway.js?v=4`` → ``?v=5``. Tests updated: ``test_post_warns_on_missing_marker`` asserts the new ``warning_code``; ``test_post_no_warning_when_marker_present`` asserts both fields are absent. 110/110 green; ruff + format clean. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --------- Co-authored-by: pandas-studio <[email protected]> Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>

The upload dropzone in the Index tab spanned the full panel width (~750px at typical card sizes) with only an icon and two short lines of content, making it look disproportionately empty. Cap `.upload-drop` at `max-width: 520px` with `margin: 0 auto` so the dropzone itself centers inside the panel while sibling controls (file list, upload button, result rows) keep their natural full-width layout. Refs #582 (Prev #2). Co-authored-by: pandas-studio <[email protected]> Co-authored-by: Claude <[email protected]>

Codex review of #784 flagged that ``enforce_write_guard``'s audit log renders ``audit_context`` values verbatim with ``repr()``. Several callers pass user-controllable strings — ``file`` (mem_add / web_api_add), ``filename`` (web_api_upload), ``key`` (scratch promote), ``chunk_id`` — and a bypass on a request whose path / filename / key embeds the same secret as the body would surface that secret in ``server.log`` even though the matched bytes from ``content`` never do. The "matched bytes must never reach logs" goal must hold for context fields too, not just the body. Add ``_sanitize_audit_value`` between caller and log emit: - non-string values (None, int, bool) pass through untouched - string values are re-scanned with ``DEFAULT_PATTERNS``; any hit replaces the whole value with the fixed marker ``<redacted: secret-shape>`` so operators see the field was scrubbed rather than missing - clean strings longer than 200 chars are truncated to keep an abusive caller from ballooning the audit line Two regression pins: - ``test_secret_in_audit_context_value_is_redacted`` — embeds the same ``sk-…`` token in the body, in ``file=``, and in ``filename=``; asserts the matched bytes never reach the log message and the redaction marker takes the secret-shape value's place. Non-secret context fields (``namespace``, ``item_idx``) still pass through. - ``test_long_audit_context_string_is_truncated`` — caps line length with the ``...(truncated)`` marker. Codex Major #2 (SPA app.js / settings-harness.js cannot pass ``force_unsafe`` on the structured 403 path) is intentionally a follow-up: backend currently fails closed (HTTP 403, no write) which is the secure direction; the SPA confirm-and-retry UX is a separate frontend change with its own scope. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

…en write surface (#784) * sec(privacy): apply trust-boundary redaction guard to every user-driven write surface The MCP `mem_add` path enforces a secret-class redaction guard before any filesystem write, but the surrounding ingress paths (Web `POST /api/add`, `POST /api/upload`, `PATCH /api/chunks/{id}`, scratch promotion, MCP `mem_edit`, CLI `mm add` / shell `add` / agent share, LangGraph adapter) were writing `req.content` / `new_content` / `entry["value"]` directly. The same SQLite database could end up holding cleartext provider tokens depending on which route the agent or user came in through. Centralise the scan-decide-log shape in `privacy.enforce_write_guard(content, surface=…, force_unsafe=False, audit_context=…)` returning a `WriteGuardResult` (`pass` / `blocked` / `bypassed`) and apply it at every gap surface. Each surface records under its own `by_tool` key so attribution stays observable; audit-log lines render `audit_context` as `key=value` pairs after `surface=` and never echo matched bytes. Web mutating routes return HTTP 403 with `{detail: "redaction_blocked", hits: N, surface: …}` so the SPA can surface a confirm-and-retry UX before re-sending with `force_unsafe=true`. `mem_batch_add` keeps its inline scan + per-entry counter pattern — the helper's single-content shape would obscure the transactional "one hit rejects the whole batch" rule that test_memory_crud_redaction already pins. Out of scope (follow-up): - `indexing/importers.py` and `indexing/url_fetcher.py` write external file/URL content rather than user-driven content; the trust model is different (operator-authorised indexing, not direct user submission) and merits its own decision pass. Tests: - `test_privacy.py::TestEnforceWriteGuard` — helper contract (pass / blocked / bypassed-with-audit / clean+force-unsafe still records pass / matched bytes never reach the log). - `test_redaction_write_surfaces.py` — `mem_edit`, CLI `mm add`, and the LangGraph adapter all run the guard and route the counter to the right `by_tool` key. - `test_web_routes.py` — new `TestAddMemoryRedaction`, `TestEditChunkRedaction`, `TestUploadRedaction`, `TestScratchPromoteRedaction` cover all four web mutating routes (block / bypass / clean-pass) end-to-end via ASGI client. Existing 36-test redaction floor + 4048-test full suite stay green; ruff check + format clean. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(privacy): redact secret-shaped audit-context values before logging Codex review of #784 flagged that ``enforce_write_guard``'s audit log renders ``audit_context`` values verbatim with ``repr()``. Several callers pass user-controllable strings — ``file`` (mem_add / web_api_add), ``filename`` (web_api_upload), ``key`` (scratch promote), ``chunk_id`` — and a bypass on a request whose path / filename / key embeds the same secret as the body would surface that secret in ``server.log`` even though the matched bytes from ``content`` never do. The "matched bytes must never reach logs" goal must hold for context fields too, not just the body. Add ``_sanitize_audit_value`` between caller and log emit: - non-string values (None, int, bool) pass through untouched - string values are re-scanned with ``DEFAULT_PATTERNS``; any hit replaces the whole value with the fixed marker ``<redacted: secret-shape>`` so operators see the field was scrubbed rather than missing - clean strings longer than 200 chars are truncated to keep an abusive caller from ballooning the audit line Two regression pins: - ``test_secret_in_audit_context_value_is_redacted`` — embeds the same ``sk-…`` token in the body, in ``file=``, and in ``filename=``; asserts the matched bytes never reach the log message and the redaction marker takes the secret-shape value's place. Non-secret context fields (``namespace``, ``item_idx``) still pass through. - ``test_long_audit_context_string_is_truncated`` — caps line length with the ``...(truncated)`` marker. Codex Major #2 (SPA app.js / settings-harness.js cannot pass ``force_unsafe`` on the structured 403 path) is intentionally a follow-up: backend currently fails closed (HTTP 403, no write) which is the secure direction; the SPA confirm-and-retry UX is a separate frontend change with its own scope. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(privacy): route mem_batch_add bypass audit through the shared sanitizer Codex re-review of #784 showed the audit-context sanitizer landed in cd27c99 only covered ``enforce_write_guard``'s single-content path. ``mem_batch_add`` keeps its own transactional decision shape and its own ``logger.warning`` line (``memory_crud.py:472``), so a secret-shaped ``file=`` argument still leaked verbatim into the warning log on a bypass — exactly the leak class cd27c99 was meant to close. Extract the audit emit into a public ``privacy.emit_bypass_audit`` seam: - Same line shape as before (``redaction bypass via force_unsafe=True (surface=…, …, content_chars=…, hits=…)``). - Each ``audit_context`` value runs through ``_sanitize_audit_value`` before ``%r`` formatting, so a secret embedded in a user-controlled string field (``namespace`` / ``file`` / ``filename`` / ``key`` / ``chunk_id``) gets replaced with ``<redacted: secret-shape>`` regardless of which entry point logged it. - Counter (``record(...)``) is intentionally NOT bumped by the emitter; ``mem_batch_add`` still owns its per-item bookkeeping contract that ``test_memory_crud_redaction.py`` already pins. ``enforce_write_guard`` now delegates to ``emit_bypass_audit``, and ``mem_batch_add`` calls it directly with the same audit_context shape (``namespace`` / ``file`` / ``item_idx``). New regression ``TestMemBatchAddRedactionGuard.test_secret_shaped_file_argument_is_redacted_in_audit_log`` reproduces Codex's manual repro: feed a secret-matching value plus a ``file=…sk-…`` filename into a force_unsafe batch and assert the warning line never contains the matched-bytes prefix while still carrying the redaction marker. Stale-comment Minor (``app.js:3730`` / ``:3763``) lands separately as #786 on main; out of scope here. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(privacy): emit real per-item hit count in mem_batch_add bypass audit The shared sanitizer wired up in 18208bd carried ``hits=1`` for every batch bypass, because the pre-scan loop only stored ``hit_indices`` without keeping the per-item ``len(privacy.scan(value))``. The single- content path (``enforce_write_guard``) reports the real count, so operators reading the audit log saw inconsistent ``hits=`` values across surfaces (Codex review of PR #784, Minor #1). Switch the pre-scan accumulator to ``hit_counts: dict[int, int]`` so each entry that matches records its true hit count, then pass that through ``emit_bypass_audit(..., hits=hit_counts[idx])``. The whole-batch reject error message keeps the same shape — the indices list is still rendered (now via ``sorted(hit_counts)``) so the user- facing message and the existing ``test_full_reject_lists_hit_indices`` pin both still hold. New regression ``TestMemBatchAddRedactionGuard.test_audit_hits_count_matches_scan_result`` feeds an entry that matches two patterns (``api_key:`` + ``sk-…``) and asserts the audit line carries ``hits=<real count>``, not the placeholder ``hits=1`` it used to. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: pandas-studio <[email protected]> Co-authored-by: Claude <[email protected]>

Addresses ultrareview P2 follow-ups on PR-1: P2 #1 — services.tag_management.rename_tag/delete_tag/merge_tags now strip inputs and reject if empty after strip. Web routes passed ``body.new_name`` raw, while MCP pre-stripped — the asymmetry let whitespace-only names persist via PUT /api/tags/{name}. The service is the single source of truth, so the gate moves there; MCP's pre-strip stays as harmless defense-in-depth. P2 #2 — services.tag_management.merge_tags dry-run no longer caps ``affected_chunks`` at the per-tag scan limit. A new ``count_chunks_by_any_tag`` storage helper does a single ``COUNT(*) WHERE EXISTS (… IN (?, …))`` over the source-tag union, so a tag attached to more than 10,000 chunks reports the true count instead of an undercount in the confirmation modal. The sample loop still caps at ``_DRY_RUN_SAMPLE_CAP`` since the UI only renders ten previews. Co-Authored-By: Claude <[email protected]>

…795) * feat(storage): scaffold tag-management helpers + service skeleton (#688 PR1 prep) Lays the policy-independent groundwork for the tag rename/delete/merge ops requested in #688. No callers wired yet — bodies fill in after the confirm thread on the RFC resolves the updated_at policy and MCP-migration scope. - ``storage.list_chunks_by_tag`` + ``count_chunks_by_tag`` for the dry-run sample / count path the confirmation modal needs. - ``storage.merge_tags`` as the third primitive next to the existing ``rename_tag`` / ``delete_tag``; dedupes the resulting per-chunk tag list and treats the target appearing in ``sources`` as a no-op source. - ``SqliteBackend._tag_write_lock`` (``asyncio.Lock``) so the upcoming service layer and ``auto_tag_storage`` can serialize their read-modify-write of ``chunks.tags`` within a single process. Cross-process safety still falls back to SQLite's WAL file lock — this is an in-process invariant only. - ``services/tag_management.py`` skeleton (``TagOpResult`` / ``TagOpSample`` + ``NotImplementedError`` stubs) so the Web (PR1), MCP (PR1 if migration is approved), and CLI (PR3) surfaces share a single entry point. Service exposes ``bump_updated_at`` as an explicit parameter so the policy choice lives at the route layer rather than buried in the service body. 22 storage tests pass (7 new); lint + format clean. Co-Authored-By: Claude <[email protected]> * fix(auto_tag): preserve full chunk metadata + simplify list helper (#688) Two corrections to the prep commit (8507b54), driven by an audit cross-check on the #688 RFC: 1. ``auto_tag_storage`` rebuilt ``ChunkMetadata`` from an explicit 8-field list, silently dropping ``overlap_before/after``, ``parent_context``, ``file_context``, and the temporal-validity columns (``valid_from_unix`` / ``valid_to_unix``). Same shape bug ``web/routes/chunks.py:update_chunk_tags`` already fixed with a spread pattern; consolidate. Pinned by a regression test that fails on the pre-fix code (mutation-validated: ``valid_from_unix`` reads as ``None`` after the broken rewrite). 2. ``list_chunks_by_tag`` now thin-wraps ``recall_chunks(tag_filter=...)``. The latter already implements the ``json_each`` EXISTS match and ORDER BY shape this helper needs (``sqlite_backend.py:996``); the parallel SQL added in the prep commit was dead duplication. 22 tag/auto-tag tests pass; lint/format clean. Co-Authored-By: Claude <[email protected]> * feat(storage): bump updated_at on tag rename/delete/merge (#688) Per the #688 confirm thread (option (a)): tag-only mutations now bump ``chunks.updated_at`` so downstream consumers see the rewrite as a write event. ``decay.py`` reads ``updated_at`` for chunk age, which means a rename or delete deliberately resets the decay timer for the affected chunks. The trade-off is documented in each helper's docstring; v1 treats tag fix-ups as "this chunk is being curated." Symmetric pin tests cover both halves of the invariant: the mutated chunk's ``updated_at`` moves forward AND an untouched chunk's ``updated_at`` stays anchored. Negative-only would false-pass against a no-op implementation. Mutation-validated (3/3 fail on the unhardened code). Co-Authored-By: Claude <[email protected]> * feat(services): implement tag-management service bodies (#688) Replaces the prep skeleton's NotImplementedError stubs with the real rename / delete / merge implementations. Each entry point: - Acquires ``storage._tag_write_lock`` for the read-modify-write window (in-process serialization with ``auto_tag_storage``). - On dry-run: returns ``count_chunks_by_tag`` plus a sample of up to 10 chunk previews via ``list_chunks_by_tag``; storage stays untouched. - On apply: delegates to the hardened storage helper (which now bumps ``updated_at``) and calls ``search_pipeline.invalidate_cache()`` on a non-zero affected count, mirroring ``web/routes/system.py:418``. The ``bump_updated_at`` parameter from the skeleton is dropped: the storage helpers now bump ``updated_at`` unconditionally per the #688 confirm thread, so the parameter would have been dead. Merge dry-run counts the union of candidate chunks across ``sources`` (excluding ``target``); this set equals what the storage helper will actually mutate because every source-bearing chunk's tag list changes when a source is rewritten to target. Target-only chunks are filtered out of the source set up front, so they are not counted. 13 service tests pass — input validation, dry-run no-write, apply + invalidate, no-match no-invalidate, merge union counting, lock acquisition observable from outside. Co-Authored-By: Claude <[email protected]> * feat(auto_tag): serialize write window via _tag_write_lock + re-read (#688) Closes the race window between ``auto_tag_storage`` and the new tag-management service: previously an auto-tag run could spend N seconds in keyword/LLM extraction, then upsert with the pre-extraction tag snapshot — silently overwriting any rename / delete / merge that landed during that window. The lock is held narrowly. Slow extraction stays outside, only the read-modify-write step (re-read latest chunk → spread metadata → upsert) runs under ``storage._tag_write_lock``. The re-read is what actually preserves the concurrent mutation: spreading on top of the just-loaded ``ChunkMetadata`` carries the latest tag list (already mutated by the service) so auto_tag layers ``new_tags`` on top instead of clobbering it. Lock-narrow + invariant via mutation ordering, per the pattern in ``feedback_lock_narrow_invariant_ordering``. Lock-acquisition test holds the lock externally and verifies auto_tag waits for it (mutation-validated: removing the lock makes the test fail). Existing field-preservation regression still passes against the re-read shape. Co-Authored-By: Claude <[email protected]> * feat(web): add /api/tags rename/delete/merge routes via shared service (#688) Closes the Web-side asymmetry the audit on #688 surfaced: until now the MCP surface had ``tag_management`` tools but the Web tab had no rename / delete / merge UX surface, so any tag fix-up required either editing chunks one by one or re-running ``auto_tag overwrite=true``. - ``PUT /api/tags/{name}`` — rename to ``body.new_name`` - ``DELETE /api/tags/{name}`` — drop the tag from every chunk - ``POST /api/tags/merge`` — fold ``body.sources`` into ``body.target`` All three honour ``?dry_run=true`` and route through ``services.tag_management``, so they pick up the lock acquisition, ``updated_at`` bump, and ``search_pipeline.invalidate_cache()`` calls without re-implementing them. ``ValueError`` from the service (empty new_name / target) maps to a 400. 8 route tests cover apply + dry-run + 400 paths and assert that ``invalidate_cache`` fires on apply but not on dry-run or no-op (``sources`` collapsing to just ``target``). Co-Authored-By: Claude <[email protected]> * feat(mcp): route tag management through shared service + add mem_tag_merge (#688) Closes the MCP/Web asymmetry the audit on #688 surfaced. The MCP tools now route through ``services.tag_management`` so both surfaces share one lock, one ``updated_at`` policy, and one ``invalidate_cache`` contract. ``mem_tag_merge`` is added so MCP picks up the same merge shape Web exposed in the previous commit. - ``mem_tag_rename`` / ``mem_tag_delete`` / ``mem_tag_merge`` all gain ``dry_run: bool = False`` for the preview path. Defaults to False so existing MCP callers (no migration noise) keep applying as before. - Same input validation as the Web routes (empty old/new/target/sources rejected with a clear error string the chat agent can show the user). - Result text formatter renders the dry-run sample chunks below the "would affect N chunks" line so the user sees what the apply would touch. Parity test discriminator: the cache invalidation. ``storage.rename_tag`` / ``delete_tag`` / ``merge_tags`` don't know about ``SearchPipeline``, so only the service-layer path can fire ``invalidate_cache``. The test swaps ``invalidate_cache`` for a counter and asserts it fires exactly once on apply, never on dry-run, and never on no-op (``sources`` collapsing to just ``target``). Mutation-validated: swapping ``mem_tag_rename`` body to call storage directly makes the counter assertion fail with ``0 == 1``. Co-Authored-By: Claude <[email protected]> * test(guards): register new tag-management surfaces in AST invariants (#688) The CSRF / redaction / namespace AST guards keep the security classification registries honest by failing whenever a new mutating surface lands without an explicit row in the registry. PR1 added three Web routes (``tags.rename_tag``, ``tags.delete_tag``, ``tags.merge_tags``) and one MCP tool (``mem_tag_merge``); register them so the suite stays green and the next contributor sees the right classification template. - ``_CSRF_PROTECTED``: all three routes (default for unsafe-method surfaces; the middleware already covers them). - ``_REDACTION_EXEMPT``: all three routes — tags are short labels and these handlers rewrite existing chunk metadata only, so the LTM trust-boundary write guard does not apply (mirrors the existing ``chunks.update_chunk_tags`` and ``tags.run_auto_tag`` rows). - ``DEFERRED_NS_SURFACES``: ``mem_tag_merge.target`` — the parameter-name heuristic catches ``target=`` but the value is a tag name, not a namespace, so no namespace validator applies. Co-Authored-By: Claude <[email protected]> * fix(services): strip tag inputs + accurate merge dry-run count (#688) Addresses ultrareview P2 follow-ups on PR-1: P2 #1 — services.tag_management.rename_tag/delete_tag/merge_tags now strip inputs and reject if empty after strip. Web routes passed ``body.new_name`` raw, while MCP pre-stripped — the asymmetry let whitespace-only names persist via PUT /api/tags/{name}. The service is the single source of truth, so the gate moves there; MCP's pre-strip stays as harmless defense-in-depth. P2 #2 — services.tag_management.merge_tags dry-run no longer caps ``affected_chunks`` at the per-tag scan limit. A new ``count_chunks_by_any_tag`` storage helper does a single ``COUNT(*) WHERE EXISTS (… IN (?, …))`` over the source-tag union, so a tag attached to more than 10,000 chunks reports the true count instead of an undercount in the confirmation modal. The sample loop still caps at ``_DRY_RUN_SAMPLE_CAP`` since the UI only renders ten previews. Co-Authored-By: Claude <[email protected]> * fix(services): absorb replace_chunk_tags + tighten rename same-name gate (#688) Two follow-ups on the cross-surface symmetry pass: - ``services.tag_management.replace_chunk_tags`` now backs ``PUT /api/chunks/{id}/tags``. The route used to ``upsert_chunks`` directly, which (a) skipped the ``_tag_write_lock`` and could race with an in-flight bulk rename/delete/merge, and (b) skipped the search TTL cache flush so a follow-up tag-filtered search could serve stale results. Routing through the service closes both gaps; the per-chunk path now obeys the same invariants as the global ones. ``Chunk`` no-op (deduped tags equal current) short-circuits without an upsert so ``updated_at`` and decay scoring are not disturbed. - ``services.tag_management.rename_tag`` rejects ``old == new`` after strip. The MCP wrapper's raw ``==`` gate was both wrong (``"foo"`` vs ``" foo "`` slipped through) and not honoured by Web — a same-name rename would still rewrite ``chunks.tags``, bumping ``updated_at`` and flushing the result cache as if data had changed. Service-layer is now the single gate; the MCP pre-check is removed. Tests: 30 service tests (incl. 4 ``replace_chunk_tags`` + 4 preserve invariants for content/embedding/hash/created_at), 1 MCP parity test for the same-name reject path, 4 web-route tests for ``update_chunk_tags`` lock + cache invariants. Pin-mutation validated. Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: pandas-studio <[email protected]> Co-authored-by: Claude <[email protected]>

tsdata and others added 3 commits April 9, 2026 19:23

tsdata merged commit 6c72b94 into main Apr 9, 2026

tsdata deleted the feat/stm-decouple-via-mcp branch April 9, 2026 10:36

tsdata mentioned this pull request Apr 9, 2026

chore(core): strip STM integration residue #3

Merged

4 tasks

tsdata mentioned this pull request Apr 9, 2026

chore: drop packages/memtomem-stm from monorepo (Phase 3b) #4

Merged

4 tasks

memtomem mentioned this pull request Apr 25, 2026

fix(chunking): seed _split_section line counter past stripped blockquote header #466

Merged

4 tasks

pandas-studio mentioned this pull request Apr 26, 2026

fix: validate agent_id at mem_agent_register / search and mm agent register (#493) #494

Merged

4 tasks

This was referenced Apr 28, 2026

feat(scheduler): mm schedule CLI + schedule_* mem_do actions (P2 Phase A.4) #522

Merged

docs: clarify scheduler dispatch needs MCP server (not mm web) #524

Merged

docs(readme): no-hooks row + predictable-surface bullet #532

Closed

memtomem mentioned this pull request Apr 29, 2026

fix(web): surface "why nothing happened" on context Sync/Import #549

Merged

4 tasks

memtomem mentioned this pull request May 5, 2026

RFC: CSRF and browser-origin guard for the local Web UI #787

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(stm): decouple surfacing via remote-only MCP#2

refactor(stm): decouple surfacing via remote-only MCP#2
tsdata merged 3 commits intomainfrom
feat/stm-decouple-via-mcp

tsdata commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tsdata commented Apr 9, 2026

Summary

Why

Changes

Follow-ups (별도 PR)

Test plan

Phase roadmap

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant