feat(tags): rename/delete/merge service + Web/MCP routes (#688 PR1)#795
Merged
feat(tags): rename/delete/merge service + Web/MCP routes (#688 PR1)#795
Conversation
… PR1 prep) Lays the policy-independent groundwork for the tag rename/delete/merge ops requested in #688. No callers wired yet — bodies fill in after the confirm thread on the RFC resolves the updated_at policy and MCP-migration scope. - ``storage.list_chunks_by_tag`` + ``count_chunks_by_tag`` for the dry-run sample / count path the confirmation modal needs. - ``storage.merge_tags`` as the third primitive next to the existing ``rename_tag`` / ``delete_tag``; dedupes the resulting per-chunk tag list and treats the target appearing in ``sources`` as a no-op source. - ``SqliteBackend._tag_write_lock`` (``asyncio.Lock``) so the upcoming service layer and ``auto_tag_storage`` can serialize their read-modify-write of ``chunks.tags`` within a single process. Cross-process safety still falls back to SQLite's WAL file lock — this is an in-process invariant only. - ``services/tag_management.py`` skeleton (``TagOpResult`` / ``TagOpSample`` + ``NotImplementedError`` stubs) so the Web (PR1), MCP (PR1 if migration is approved), and CLI (PR3) surfaces share a single entry point. Service exposes ``bump_updated_at`` as an explicit parameter so the policy choice lives at the route layer rather than buried in the service body. 22 storage tests pass (7 new); lint + format clean. Co-Authored-By: Claude <[email protected]>
) Two corrections to the prep commit (8507b54), driven by an audit cross-check on the #688 RFC: 1. ``auto_tag_storage`` rebuilt ``ChunkMetadata`` from an explicit 8-field list, silently dropping ``overlap_before/after``, ``parent_context``, ``file_context``, and the temporal-validity columns (``valid_from_unix`` / ``valid_to_unix``). Same shape bug ``web/routes/chunks.py:update_chunk_tags`` already fixed with a spread pattern; consolidate. Pinned by a regression test that fails on the pre-fix code (mutation-validated: ``valid_from_unix`` reads as ``None`` after the broken rewrite). 2. ``list_chunks_by_tag`` now thin-wraps ``recall_chunks(tag_filter=...)``. The latter already implements the ``json_each`` EXISTS match and ORDER BY shape this helper needs (``sqlite_backend.py:996``); the parallel SQL added in the prep commit was dead duplication. 22 tag/auto-tag tests pass; lint/format clean. Co-Authored-By: Claude <[email protected]>
Per the #688 confirm thread (option (a)): tag-only mutations now bump ``chunks.updated_at`` so downstream consumers see the rewrite as a write event. ``decay.py`` reads ``updated_at`` for chunk age, which means a rename or delete deliberately resets the decay timer for the affected chunks. The trade-off is documented in each helper's docstring; v1 treats tag fix-ups as "this chunk is being curated." Symmetric pin tests cover both halves of the invariant: the mutated chunk's ``updated_at`` moves forward AND an untouched chunk's ``updated_at`` stays anchored. Negative-only would false-pass against a no-op implementation. Mutation-validated (3/3 fail on the unhardened code). Co-Authored-By: Claude <[email protected]>
Replaces the prep skeleton's NotImplementedError stubs with the real rename / delete / merge implementations. Each entry point: - Acquires ``storage._tag_write_lock`` for the read-modify-write window (in-process serialization with ``auto_tag_storage``). - On dry-run: returns ``count_chunks_by_tag`` plus a sample of up to 10 chunk previews via ``list_chunks_by_tag``; storage stays untouched. - On apply: delegates to the hardened storage helper (which now bumps ``updated_at``) and calls ``search_pipeline.invalidate_cache()`` on a non-zero affected count, mirroring ``web/routes/system.py:418``. The ``bump_updated_at`` parameter from the skeleton is dropped: the storage helpers now bump ``updated_at`` unconditionally per the #688 confirm thread, so the parameter would have been dead. Merge dry-run counts the union of candidate chunks across ``sources`` (excluding ``target``); this set equals what the storage helper will actually mutate because every source-bearing chunk's tag list changes when a source is rewritten to target. Target-only chunks are filtered out of the source set up front, so they are not counted. 13 service tests pass — input validation, dry-run no-write, apply + invalidate, no-match no-invalidate, merge union counting, lock acquisition observable from outside. Co-Authored-By: Claude <[email protected]>
…688) Closes the race window between ``auto_tag_storage`` and the new tag-management service: previously an auto-tag run could spend N seconds in keyword/LLM extraction, then upsert with the pre-extraction tag snapshot — silently overwriting any rename / delete / merge that landed during that window. The lock is held narrowly. Slow extraction stays outside, only the read-modify-write step (re-read latest chunk → spread metadata → upsert) runs under ``storage._tag_write_lock``. The re-read is what actually preserves the concurrent mutation: spreading on top of the just-loaded ``ChunkMetadata`` carries the latest tag list (already mutated by the service) so auto_tag layers ``new_tags`` on top instead of clobbering it. Lock-narrow + invariant via mutation ordering, per the pattern in ``feedback_lock_narrow_invariant_ordering``. Lock-acquisition test holds the lock externally and verifies auto_tag waits for it (mutation-validated: removing the lock makes the test fail). Existing field-preservation regression still passes against the re-read shape. Co-Authored-By: Claude <[email protected]>
#688) Closes the Web-side asymmetry the audit on #688 surfaced: until now the MCP surface had ``tag_management`` tools but the Web tab had no rename / delete / merge UX surface, so any tag fix-up required either editing chunks one by one or re-running ``auto_tag overwrite=true``. - ``PUT /api/tags/{name}`` — rename to ``body.new_name`` - ``DELETE /api/tags/{name}`` — drop the tag from every chunk - ``POST /api/tags/merge`` — fold ``body.sources`` into ``body.target`` All three honour ``?dry_run=true`` and route through ``services.tag_management``, so they pick up the lock acquisition, ``updated_at`` bump, and ``search_pipeline.invalidate_cache()`` calls without re-implementing them. ``ValueError`` from the service (empty new_name / target) maps to a 400. 8 route tests cover apply + dry-run + 400 paths and assert that ``invalidate_cache`` fires on apply but not on dry-run or no-op (``sources`` collapsing to just ``target``). Co-Authored-By: Claude <[email protected]>
…merge (#688) Closes the MCP/Web asymmetry the audit on #688 surfaced. The MCP tools now route through ``services.tag_management`` so both surfaces share one lock, one ``updated_at`` policy, and one ``invalidate_cache`` contract. ``mem_tag_merge`` is added so MCP picks up the same merge shape Web exposed in the previous commit. - ``mem_tag_rename`` / ``mem_tag_delete`` / ``mem_tag_merge`` all gain ``dry_run: bool = False`` for the preview path. Defaults to False so existing MCP callers (no migration noise) keep applying as before. - Same input validation as the Web routes (empty old/new/target/sources rejected with a clear error string the chat agent can show the user). - Result text formatter renders the dry-run sample chunks below the "would affect N chunks" line so the user sees what the apply would touch. Parity test discriminator: the cache invalidation. ``storage.rename_tag`` / ``delete_tag`` / ``merge_tags`` don't know about ``SearchPipeline``, so only the service-layer path can fire ``invalidate_cache``. The test swaps ``invalidate_cache`` for a counter and asserts it fires exactly once on apply, never on dry-run, and never on no-op (``sources`` collapsing to just ``target``). Mutation-validated: swapping ``mem_tag_rename`` body to call storage directly makes the counter assertion fail with ``0 == 1``. Co-Authored-By: Claude <[email protected]>
…688) The CSRF / redaction / namespace AST guards keep the security classification registries honest by failing whenever a new mutating surface lands without an explicit row in the registry. PR1 added three Web routes (``tags.rename_tag``, ``tags.delete_tag``, ``tags.merge_tags``) and one MCP tool (``mem_tag_merge``); register them so the suite stays green and the next contributor sees the right classification template. - ``_CSRF_PROTECTED``: all three routes (default for unsafe-method surfaces; the middleware already covers them). - ``_REDACTION_EXEMPT``: all three routes — tags are short labels and these handlers rewrite existing chunk metadata only, so the LTM trust-boundary write guard does not apply (mirrors the existing ``chunks.update_chunk_tags`` and ``tags.run_auto_tag`` rows). - ``DEFERRED_NS_SURFACES``: ``mem_tag_merge.target`` — the parameter-name heuristic catches ``target=`` but the value is a tag name, not a namespace, so no namespace validator applies. Co-Authored-By: Claude <[email protected]>
Addresses ultrareview P2 follow-ups on PR-1: P2 #1 — services.tag_management.rename_tag/delete_tag/merge_tags now strip inputs and reject if empty after strip. Web routes passed ``body.new_name`` raw, while MCP pre-stripped — the asymmetry let whitespace-only names persist via PUT /api/tags/{name}. The service is the single source of truth, so the gate moves there; MCP's pre-strip stays as harmless defense-in-depth. P2 #2 — services.tag_management.merge_tags dry-run no longer caps ``affected_chunks`` at the per-tag scan limit. A new ``count_chunks_by_any_tag`` storage helper does a single ``COUNT(*) WHERE EXISTS (… IN (?, …))`` over the source-tag union, so a tag attached to more than 10,000 chunks reports the true count instead of an undercount in the confirmation modal. The sample loop still caps at ``_DRY_RUN_SAMPLE_CAP`` since the UI only renders ten previews. Co-Authored-By: Claude <[email protected]>
…ate (#688) Two follow-ups on the cross-surface symmetry pass: - ``services.tag_management.replace_chunk_tags`` now backs ``PUT /api/chunks/{id}/tags``. The route used to ``upsert_chunks`` directly, which (a) skipped the ``_tag_write_lock`` and could race with an in-flight bulk rename/delete/merge, and (b) skipped the search TTL cache flush so a follow-up tag-filtered search could serve stale results. Routing through the service closes both gaps; the per-chunk path now obeys the same invariants as the global ones. ``Chunk`` no-op (deduped tags equal current) short-circuits without an upsert so ``updated_at`` and decay scoring are not disturbed. - ``services.tag_management.rename_tag`` rejects ``old == new`` after strip. The MCP wrapper's raw ``==`` gate was both wrong (``"foo"`` vs ``" foo "`` slipped through) and not honoured by Web — a same-name rename would still rewrite ``chunks.tags``, bumping ``updated_at`` and flushing the result cache as if data had changed. Service-layer is now the single gate; the MCP pre-check is removed. Tests: 30 service tests (incl. 4 ``replace_chunk_tags`` + 4 preserve invariants for content/embedding/hash/created_at), 1 MCP parity test for the same-name reject path, 4 web-route tests for ``update_chunk_tags`` lock + cache invariants. Pin-mutation validated. Co-Authored-By: Claude <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Backend for tag management (rename / delete / merge) requested in #688, plus the prerequisite hardening the audit on that RFC surfaced. PR1 of the 3-way split — UI (PR2) and CLI (PR3) follow.
services/tag_management.pyas the single source of truth for tag mutations; Web and MCP both route through it.PUT /api/tags/{name}(rename),DELETE /api/tags/{name},POST /api/tags/merge— all honour?dry_run=true.mem_tag_rename/mem_tag_deleteto the shared service and addsmem_tag_mergeso MCP and Web stay symmetric (closes the asymmetry the audit caught — MCP could rename/delete, Web could not).chunks.updated_atis now bumped on every mutated row so the search result TTL cache and decay scoring see tag fix-ups as write events. Cache invalidation is wired throughsearch_pipeline.invalidate_cache()after a successful apply.auto_tag_storagenow acquires the same per-storage_tag_write_lockaround its read-modify-write window and re-reads the chunk inside the lock, so a concurrent rename / delete / merge can't be silently overwritten by stale auto-tag output.auto_tag_storagewas rebuildingChunkMetadatafrom an explicit 8-field list and silently droppingoverlap_before/after,valid_from_unix,valid_to_unix, etc. Now uses the same spread patternweb/routes/chunks.py:update_chunk_tagsadopted earlier.Confirm-thread answers (resolved)
Per the audit on #688:
auto_tag— in-processasyncio.Lockshared byauto_tag_storageand the new tag-management ops. Cross-process consistency stays with SQLite WAL — concurrentmm webandmmCLI processes can still race on the same chunks; documented as a known limitation in the service docstring.updated_atpolicy — option (a) bump.decay.pyreadsupdated_atfor chunk age, so a tag mutation deliberately resets the decay timer for affected chunks. Treated as "this chunk is being curated."Test plan
storage(12 tests) —list_chunks_by_tag/count_chunks_by_tag/merge_tagshappy path + edge cases + symmetric pin forupdated_atbump (renamed chunk moves forward, untouched chunk stays anchored). Mutation-validated: 3/3 pin tests fail against unhardened code.services.tag_management(13 tests) — dry-run no-write + sample shape, apply + cache invalidation, no-match no-invalidate, merge union counting, lock acquisition observable from outside.SearchPipeline, so only the service can callinvalidate_cache(). Mutation-validated: swapping the MCP rename body toapp.storage.rename_tag(...)direct makes the parity test fail with0 == 1.invalidate_cachefires on apply but not on dry-run or no-op.auto_tag(14 tests, +1 lock acquisition + 1 field preservation regression). Mutation-validated.Out of scope
mm tags rename/delete/mergeCLI commands.🤖 Generated with Claude Code