feat(namespace): path-based policy rules for namespace auto-assignment#253
Merged
feat(namespace): path-based policy rules for namespace auto-assignment#253
Conversation
Adds a rule-based layer to NamespaceConfig so users can declare
path-glob -> namespace mappings instead of passing `namespace=` on
every `mem_index` call, and to escape the limitation of `enable_auto_ns`
(which uses the immediate parent folder name and so produces opaque
namespaces like `<UUID>` or `subagents` under auto-discovered roots).
The resolution priority is now:
explicit param -> rules (first valid match) -> enable_auto_ns -> default
path_glob uses gitignore syntax via `pathspec.GitIgnoreSpec` -- the same
engine as `IndexingConfig.exclude_patterns`, with leading `~/` expanded
at load time and case-insensitive matching against the absolute resolved
file path. The namespace template supports the `{parent}` placeholder
which expands to the matched file's immediate parent folder name.
Validation decisions:
- Unknown placeholders (e.g. `{unknown}`) are rejected at load time so
typos fail loudly at startup rather than silently skipping rules at
index time.
- A rule whose `{parent}` would expand to an empty string is skipped at
runtime (fall through to the next rule / auto_ns / default) with a
one-time warning log per rule index.
- Non-empty namespace, <= 128 chars, no ASCII control characters.
Rules is an `Annotated[list[NamespacePolicyRule], APPEND]` field so
config.d/*.json fragments can contribute rules independently. Fragments
concatenate in alphabetical filename order -- use numeric prefixes
(`10-claude.json`, `20-gdrive.json`) for cross-fragment precedence.
To make APPEND merge work for a list-of-BaseSettings field, `_dedup_key`
now handles dict / BaseSettings / nested list, and the fragment loader
coerces JSON dicts into the declared item type before dedup so the
resulting list contains fully-validated model instances.
Default is []; backwards compatible for existing users.
Web UI editor and `config_set` CLI support are intentionally deferred
-- list-of-objects editing is a separate architectural concern that
doesn't need to block the data-model change.
Tests: 12 TestNamespacePolicyRules cases (core path + edge cases
including unknown-placeholder rejection, empty-parent fall-through,
case-insensitive matching, literal namespace without placeholder,
length-limit rejection) plus 2 config.d fragment tests covering APPEND
merge and the alphabetical-order contract.
Adds a "Namespace rules (path-based auto-tagging)" subsection to the
configuration guide covering semantics (gitignore syntax, case
sensitivity, first-match-wins, `{parent}` placeholder, APPEND merge in
alphabetical filename order) and a "Verifying your rules" block so
users know how to confirm a rule fired (`mm config show`, force
re-index, `mm session list` or Web UI `/#sources`, namespace labels in
search output).
Cross-references the new section from the user guide's Auto-namespace
section so readers looking at `enable_auto_ns` discover rules as the
preferred path when the immediate parent folder name is opaque.
… path_glob Pins the contract that ``_dedup_key`` uses ``BaseSettings.model_dump`` post-validator values: a rule written as ``~/foo/**`` and the same rule written in its expanded absolute form collide after construction and collapse to one. A future refactor that moved the ``~/`` expansion out of the validator (or changed the dedup key off ``model_dump``) would start emitting two rules and fail here -- making the behavior change visible at PR time rather than via a surprised bug report from a user who split their rules across two config.d fragments.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
NamespacePolicyRuletoNamespaceConfigso users can declare path-glob → namespace mappings instead of passingnamespace=on everymem_indexcall. Closes the practical gap inenable_auto_ns, which uses the immediate parent folder name and so produces opaque namespaces like<UUID>orsubagentsunder auto-discovered roots (~/.claude/projects/<UUID>/...).Resolution priority
Rules are evaluated between the explicit-argument check and the
enable_auto_nsfallback. Default is[], so existing users see no behavior change until they opt in viaconfig.jsonorconfig.d/*.json.Design decisions
pathspec.GitIgnoreSpec, case-insensitive, matched against the absolute resolved file path with any leading/stripped. Same semantics asIndexingConfig.exclude_patternsso users learn one pattern language.{parent}placeholder: expands to the matched file's immediate parent folder name. Only{parent}is supported in this release; unknown placeholders are rejected at load time so typos fail loudly at startup rather than silently skipping rules at index time.{parent}would expand to an empty string at runtime, the rule is skipped (fall through to the next rule /auto_ns/default_namespace) with a one-time warning log per rule index.Annotated[list[NamespacePolicyRule], APPEND]soconfig.d/*.jsonfragments can contribute rules independently. Fragments concatenate in alphabetical filename order — use numeric prefixes (10-claude.json,20-gdrive.json) for cross-fragment precedence.path_globwith~/expanded at load time.Incidental fixes to config merge machinery
This PR is the first
list[BaseSettings]field in the codebase and surfaced two gaps inload_config_dthat are fixed inline. Both are scoped to load time (~/.memtomem/config.json+~/.memtomem/config.d/*.jsonmerging) and do not touch the mutation paths (config_setCLI,PATCH /api/config, future Web UI editors)._dedup_keynow handlesdict/BaseSettings/ nestedlist. Previously it only normalisedPath→strand let everything else fall through to raw Python hashing, which raisedTypeError: unhashable type: 'dict'the first time a list-of-objects field went through APPEND dedup. The new version recursively convertsBaseSettingsviamodel_dump(mode="json")and dicts into sorted tuples, so a native model instance and the raw JSON dict that produced it share a dedup key. Existinglist[str]/list[Path]/list[float]fields are untouched.validate_assignmentBaseSettingsletssetattr(section_obj, key, [<dict>])succeed without coercing, so the merged list would contain raw dicts. The loop now introspects the field annotation, and if the item type is aBaseSettingssubclass it callsitem_type.model_validate(item)on each incoming dict before the dedup step. Invalid entries are logged and skipped rather than crashing the whole fragment.Dedup semantics freeze (commit 3):
_dedup_keyintentionally hashes post-validator field values, so a rule written as~/foo/**and the same rule written as the expanded absolute path/Users/X/foo/**collide after construction and dedupe to one entry. A new test (test_config_d_namespace_rules_dedup_after_home_expansion) pins this contract so a future refactor that moved~/expansion out of the validator — or moved the dedup key offmodel_dump— would fail at PR time rather than surfacing as a confused bug report from a user who split their rules across twoconfig.dfragments.Out of scope: the mutation path still lacks nested-
BaseModelsupport —coerce_and_validateinconfig.pycan't consume alist[BaseSettings]today, soconfig_set namespace.rules ...is rejected and a Web UI editor can't persist rules viaPATCH /api/configyet. Fixing that is a separate PR (see Intentionally deferred below).Intentionally deferred
coerce_and_validatemutation-path gap above; attempting the editor before that fix would re-discover the samelist[BaseSettings]problem on the mutate side.config_set namespace.rulesCLI — same root cause as Web UI editor.{match:N}capture group placeholder —{parent}covers the common cases; add if evidence accrues.priority: intfield — alphabetical fragment ordering + in-fragment declaration order cover 99% of cases.mm namespace test <path>) —mm config showis enough for now.Each is listed in the plan file's "Out of scope" block so the bounds are explicit.
Test plan
TestNamespacePolicyRulescases (core 7 + edge 5): explicit-wins, rule-match, first-match-wins, rules-beat-auto_ns,{parent}substitution,~/expansion, empty-parent fall-through (with log-once assertion), case-insensitive matching, literal template, unknown-placeholder rejection, length-limit rejection.test_config_overrides.pycases covering APPEND merge, the alphabetical-order contract, and the home-expansion dedup freeze.TestResolveNamespace(5 tests) — regression-free.test_every_list_field_declares_merge_strategy— passes for the new field.uv run pytest -m "not ollama"— 1650 + new 15 = 1651 passed, 46 deselected.uv run ruff check+uv run ruff format --check.uv run mypy packages/memtomem/src/memtomem/config.py packages/memtomem/src/memtomem/indexing/engine.py— no issues.Docs in the same PR
docs/guides/configuration.md— "Namespace rules (path-based auto-tagging)" subsection under Namespace: JSON example, full semantics, "Verifying your rules" block.docs/guides/user-guide.md— cross-reference from the Auto-namespace section so readers land on the rules doc when the parent folder heuristic isn't enough.