Skip to content

fix(config): lock provider category vocabulary (RFC #304 prep)#313

Merged
memtomem merged 1 commit intomainfrom
fix/config-lock-provider-category-vocab
Apr 20, 2026
Merged

fix(config): lock provider category vocabulary (RFC #304 prep)#313
memtomem merged 1 commit intomainfrom
fix/config-lock-provider-category-vocab

Conversation

@memtomem
Copy link
Copy Markdown
Owner

Summary

  • Add _VALID_PROVIDER_CATEGORIES frozenset + import-time assert in config.py so new rows in _PROVIDER_CATEGORY_PATTERNS cannot silently expand the category vocabulary while RFC #304 is still open.
  • Extract the assertion message to _VOCABULARY_LOCK_MESSAGE and pin-test that the constant still contains #304, so a future rewrite that drops the RFC reference fails a test rather than surviving on an unrelated #304 mention elsewhere in the file.
  • Mirror the existing _VALID_PRESET_PLACEHOLDERS lock in cli/init_cmd.py:280-285 — both axes (placeholder vocabulary and category vocabulary) are now guarded the same way.

Why this is "prep", not "resolve"

RFC #304 has 7 unresolved design questions (vendor vs source axis, wire format, user-bucket placement, collapse semantics, group-reindex scope, Codex rename, Gemini exclusion) and is staying OPEN. This PR does not answer any of them. It closes a separate defensive gap that would otherwise let an unrelated contributor silently expand the category vocabulary — pre-empting the RFC — by adding a fourth regex row.

Non-goals / explicit follow-ups

These are not in this PR; each is a separate decision:

  • provider: str response field (RFC Phase 1) — blocked on the RFC deciding the wire format.
  • UI byCategory parallel literal in sources-memory-dirs.js:249-259 — couples to Phase 2 tree design.
  • Tighten categorize_memory_dir return type to Literal[...] — independent mypy-advisory sweep.
  • Windows backslash-normalised path support — independent issue.

Smoke evidence

The pin tests verify the pass condition. To prove the assertion actually fails loudly when violated, I manually added a stray row to _PROVIDER_CATEGORY_PATTERNS and re-imported:

$ uv run python -c "import memtomem.config"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    import memtomem.config
  File ".../config.py", line 1065, in <module>
    assert ({cat for cat, _ in _PROVIDER_CATEGORY_PATTERNS} | {"user"}) == _VALID_PROVIDER_CATEGORIES, (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Provider category vocabulary changed without updating _VALID_PROVIDER_CATEGORIES. See RFC #304 before adding categories.

Confirms: (1) fires at import time (not at a later call site), and (2) the message points at RFC #304 so anyone hitting this traceback lands in the right issue. Stray row reverted.

Test plan

  • uv run ruff check packages/memtomem/src — pass
  • uv run ruff format --check packages/memtomem/src packages/memtomem/tests — pass
  • uv run pytest packages/memtomem/tests/test_init_cmd.py -k "vocabulary_locked or references_rfc or TestCategorizeMemoryDir or TestDetectProviderDirsRoundtrip" — 11 pass
  • uv run pytest -m "not ollama"1902 passed, 46 deselected, 0 failures
  • Negative smoke (shown above)

Co-Authored-By: Claude [email protected]

Until RFC #304 settles the vendor/product hierarchy for memory_dir
categorization, extending ``_PROVIDER_CATEGORY_PATTERNS`` silently
expands the returned vocabulary and pre-empts the RFC. Mirror the
existing ``_VALID_PRESET_PLACEHOLDERS`` lock in ``cli/init_cmd.py`` on
the category side so both axes are enforced, not just one:

- ``_VALID_PROVIDER_CATEGORIES`` frozenset enumerating the approved
  four categories (``user``, ``claude-memory``, ``claude-plans``,
  ``codex``)
- ``_VOCABULARY_LOCK_MESSAGE`` constant holding the assertion string so
  the RFC-reference pin test can verify the exact message instead of
  ``inspect.getsource`` scanning, which would false-positive on
  unrelated ``#304`` mentions elsewhere in ``config.py``
- Import-time ``assert`` that the derived category set (``cat for cat,
  _ in _PROVIDER_CATEGORY_PATTERNS`` unioned with the ``"user"``
  fallback) matches the declared vocabulary; placed before the
  ``PROVIDER_DIR_CATEGORIES`` derivation so the failure surfaces at
  import rather than at a later call site
- Two pin tests in ``test_init_cmd.py`` next to the existing
  ``TestCategorizeMemoryDir`` / ``TestDetectProviderDirsRoundtrip``
  suites (no ``test_config.py`` in this repo; follows PR #301
  precedent)

Non-destructive: no API shape change, no new response field, no UI
work. RFC #304 itself stays OPEN; its Phase 1 (``provider`` response
field) remains blocked on the RFC deciding the wire format.

Smoke (manual, reverted): adding
``("gemini", re.compile(r"/\.gemini/memories/?$"))`` to
``_PROVIDER_CATEGORY_PATTERNS`` without updating
``_VALID_PROVIDER_CATEGORIES`` raises at import:

    AssertionError: Provider category vocabulary changed without
    updating _VALID_PROVIDER_CATEGORIES. See RFC #304 before adding
    categories.

Co-Authored-By: Claude <[email protected]>
@memtomem memtomem merged commit 8fa4419 into main Apr 20, 2026
7 checks passed
@memtomem memtomem deleted the fix/config-lock-provider-category-vocab branch April 20, 2026 07:50
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants