Skip to content

RFC: Tag management — rename / delete / merge in the Tags tab #688

@memtomem

Description

@memtomem

Background

packages/memtomem/src/memtomem/web/routes/tags.py only has two
routes today: GET /tags (list with counts) and POST /tags/auto
(extract tags). There is no way to rename, delete, or merge tags
through the API or the UI — every fix-up requires either editing
chunks one by one in the Search-tab detail panel or re-running
auto_tag with overwrite=true, both of which are error-prone for
even small taxonomy clean-ups.

A reviewer flagged this in a recent UX walk-through (Tags tab → "no
manage UI"). The Tier-1/Tier-2 polish PRs (#682#687) intentionally
deferred this because it's a backend-shaped change with retrieval-
correctness implications, not a paint job. Filing as an RFC so we can
align on shape before any code lands.

Goals

  1. Rename one tag globally: old → new, applied to every chunk
    that carries old. Idempotent if new already coexists.
  2. Delete one tag globally: drop it from every chunk that carries
    it. Chunks that end up tag-less stay indexed; we don't re-tag.
  3. Merge N tags → 1: same as rename but for a set, with the
    resulting chunk-tag list deduplicated.

Out of scope for this RFC: tag taxonomies / hierarchies, synonyms,
per-namespace tag scopes, undo history.

Proposed surface

Backend

PUT    /api/tags/{name}           # rename body { new_name: str }
DELETE /api/tags/{name}           # delete tag from all chunks
POST   /api/tags/merge            # body { sources: [str], target: str }

All three return:

{
  "tag": "<resolved name>",
  "affected_chunks": <int>,
  "dry_run": <bool>,
}

dry_run=true (URL param) → no writes, just return the count and
sample chunk ids (cap 10) so the UI can show a confirmation modal.

Implementation notes:

  • Iterate matching chunks via the existing
    storage.list_chunks_by_tag(tag) (add if missing) and rebuild
    ChunkMetadata.tags per chunk.
  • Single transaction per request (atomic across the
    upsert_chunks batch) so partial failure can't strand a tag in
    a half-renamed state.
  • Embeddings stay valid: tag changes touch metadata.tags only,
    not content, so BM25 / dense indexes don't need a rebuild.
    (Worth pinning with a test that asserts embedding,
    content_hash, created_at survive a rename.)
  • Reject system / reserved prefixes (validity:, system:,
    whatever the canonical list is) at the route layer with a 400.

UI (Tags tab)

Each row in the tag list / cloud gets a hover-revealed action menu
with Rename, Merge into…, Delete. All three open a confirm
modal that:

  1. Calls the route with dry_run=true first
  2. Shows "This will affect N chunks (sample: …)"
  3. Re-calls without dry_run only after the user confirms

The active-filter chip pattern from PR #684 is the closest visual
cousin — same accent / muted colour split, same "you are about to
do something irreversible" weight.

Risks / open questions

  • Concurrent auto_tag: if the Auto-Tag form is mid-run when a
    rename fires, the rename can race against newly written tags.
    Cheapest answer: hold the same per-storage write lock the
    auto-tag path already uses, and let the second caller block.
    Worth a test that pins the lock invariant.
  • Reserved tags: we don't have a canonical list. Surveying
    validity:*, system:*, archive:* callsites is part of the
    RFC, not the implementation PR.
  • Empty-tag chunks after delete: behaviour is "stay indexed,
    no re-tag." Is that the right call? Alternative: queue a
    auto_tag pass with sample_limit=0 for the affected chunks,
    but that mixes two features and breaks the "this op is fast and
    reversible at the metadata layer" property.
  • Audit / history: do we want a write-ahead log of
    rename/delete/merge ops in storage, or is the chunk
    updated_at bump enough? Default to the latter for v1.
  • CLI parity: the mm CLI doesn't have mm tags rename
    either. Probably worth shipping CLI + Web in the same release so
    scripts and the UI stay symmetric (see
    feedback_mcp_cli_sibling_gate_parity.md-style invariant).

Suggested split

If this gets a green light, three small PRs feel cleaner than one:

  1. Backend routes + storage helpers + tests (no UI).
  2. UI confirm modal + hover actions + i18n.
  3. CLI commands (mm tags rename, mm tags delete,
    mm tags merge) with shared service-layer code from PR 1.

Happy to take the first PR if there's appetite — comment with
agreement on the surface or push back on any of the four risks
above.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions