Skip to content

RFC: make auto-discover of well-known memory dirs overridable (currently unconditional) #260

@memtomem

Description

@memtomem

Problem

ensure_auto_discovered_dirs (packages/memtomem/src/memtomem/config.py:876) runs unconditionally in packages/memtomem/src/memtomem/server/component_factory.py:48 after load_config_overrides, augmenting memory_dirs with three hardcoded paths whenever they exist on disk:

  • ~/.claude/projects — Claude Code
  • ~/.gemini — Gemini CLI
  • ~/.codex/memories — Codex CLI

No env var, flag, or config key disables this. The docstring frames it as intentional ("user overrides don't suppress auto-discovery").

For workflows where these directories contain sensitive content (Gemini OAuth tokens, Claude Code session/conversation data, Codex memories), users currently cannot opt out of indexing them without removing the directory itself — which breaks the owning CLI. Removing entries from memory_dirs in config.json is silently reverted on every server start.

Current defense-in-depth status

The upcoming 0.1.10 release closes the specific leak path that combined auto-discover with an unguarded IndexEngine.index_file. PRs #225, #226, and #252 are already on main; tag + PyPI publish pending.

Once 0.1.10 ships, the engine refuses to index files matching the exclude filter (built-in denylist + user indexing.exclude_patterns, against both absolute paths and memory-dir-relative paths) at the IndexEngine.index_file entry point.

Auto-discover coupling remains a defense-in-depth concern independent of that fix: any future engine-layer regression immediately re-opens the leak, and the exclude filter only covers what its patterns already enumerate. An explicit opt-out would give users agency over which directories are indexed at all, independent of filename heuristics.

Three candidate directions

1. Allowlist default

memory_dirs starts empty; users add directories explicitly. Auto-discover is removed entirely.

  • ✅ Cleanest semantics. No magic.
  • ❌ Biggest UX break. Every user must configure once before indexing does anything useful.

2. JIT consent

First encounter of a well-known directory prompts for approval; the decision is stored in config.

  • ✅ Preserves the current first-launch UX for interactive setups.
  • ❌ Adds friction for headless / CI environments. Prompt UX needs a non-interactive fallback policy.

3. Memory vs. watch split

Split memory_dirs (intentional, user-curated) from watch_dirs (auto-discovered, narrower permissions). Auto-discover populates only the latter.

  • ✅ Cleanest conceptual boundary between "things the user wants indexed" vs. "things the system was told to notice".
  • ⚠️ Moderate UX change. Requires rethinking the indexing-permissions model (e.g., should watch_dirs entries have a denylist-by-default?).

Trade-off summary

All three options require migration paths. Option 1 offers the cleanest semantics at the cost of first-launch UX. Option 2 preserves first-launch UX but needs a headless/CI prompt policy to ship. Option 3 restructures the permissions model for a cleaner conceptual boundary, with moderate UX change. No fixed maintainer leaning yet — community input on these trade-offs welcome.

Out of scope for this issue

  • The security fix itself (shipped by the 0.1.10 release on main, PyPI publish pending)
  • Namespace / tagging changes for auto-discovered dirs (separate ingest-pipeline concern)
  • Changes to the built-in denylist default contents

Cross-refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions