Skip to content

feat: opt-in provider memory dirs + scope narrowing per official docs#292

Merged
memtomem merged 2 commits intomainfrom
feat/opt-in-provider-dirs
Apr 19, 2026
Merged

feat: opt-in provider memory dirs + scope narrowing per official docs#292
memtomem merged 2 commits intomainfrom
feat/opt-in-provider-dirs

Conversation

@memtomem
Copy link
Copy Markdown
Owner

Summary

Replace silent ensure_auto_discovered_dirs runtime indexing with an explicit opt-in wizard step, narrow scope to canonical memory surfaces per each provider's official docs, and migrate legacy auto_discover=True installs transparently.

Motivation

The previous default silently indexed three provider home directories on every startup:

  • ~/.claude/projects/ — but this contains huge per-session .jsonl transcripts (1MB+ each, 200+ files) + staging/, not just memory
  • ~/.gemini/ — contains oauth_creds.json, browser profile dirs, history/, tmp/ — most of it isn't memory and some is sensitive
  • ~/.codex/memories/ — this one was correctly scoped

Users were never asked. The scope was much wider than intended even though supported_extensions filtered most transcript/JSON noise — the directory walks were still happening on every index/watch.

Changes

Wizard (user-facing)

New Step 4 "Provider memory folders" — detects canonical paths on disk and prompts per-category. Non-existent categories are skipped silently.

Step 4: Provider Memory Folders
  Make memory from other AI tools searchable through memtomem?
  Each option is opt-in; declined folders stay out of search.

  Claude Code per-project memory (3 projects with .md content)? [y/N]
  Claude Code plans (~/.claude/plans/)?                          [y/N]
  Codex CLI memories (~/.codex/memories/)?                       [y/N]

Non-interactive: mm init -y --include-provider claude-memory --include-provider codex (repeatable flag).

Narrowed scope (verified against official docs)

Provider Path Source
Claude Code auto-memory ~/.claude/projects/<project>/memory/ (with *.md content) https://code.claude.com/docs/en/memory
Claude Code plans ~/.claude/plans/ local convention
Codex CLI ~/.codex/memories/ https://developers.openai.com/codex/memories

Gemini CLI removed. Its memory is a single file ~/.gemini/GEMINI.md that doesn't fit the memory_dirs directory abstraction; the parent dir contains oauth_creds.json. mm ingest gemini-memory (one-shot import) is unchanged for Gemini users.

Migration

indexing.auto_discover becomes a one-shot migration trigger. For existing installs with the flag True (default or explicit) AND a config.json on disk:

  1. Detect canonical provider dirs that exist.
  2. Append to memory_dirs.
  3. Flip flag to False.
  4. Atomic write. Single INFO-level log line.

Brand-new installs (no config.json) skip migration entirely — the wizard is the only path that adds provider dirs. Subsequent startups see auto_discover: false → no-op.

Verification

  • Automated: 1837 tests pass (46 ollama-marker skipped), ruff check / ruff format --check clean, mypy 0 errors across 193 source files.
  • New tests:
    • TestProviderDirsStep — wizard step: no-detect skip, mixed accept/reject, empty-subdir filter, per-project enumeration via single prompt, state["provider_dirs"]memory_dirs merge with dedup + auto_discover: false pin.
    • TestIncludeProviderFlag — non-interactive: category-to-dir resolution, silent skip when unavailable, no-flag = no provider dirs.
    • Migration tests: noop when flag False, noop when no config.json, appends + flips + persists, idempotent, env-var MEMTOMEM_INDEXING__AUTO_DISCOVER=false short-circuits.
    • _detect_provider_dirs structural tests: fixed categories, excludes Gemini, filters empty Claude memory subdirs, finds plans + codex.
  • Manual (4 paths with isolated HOME=/tmp/...):
    1. Fresh install with --include-provider codex → opt-in works, only selected dirs in memory_dirs, auto_discover: false persisted ✓
    2. Skip wizard (no config.json) → factory default only, no auto-indexing despite fake ~/.claude/projects/<proj>/memory/ present ✓
    3. Legacy install with auto_discover: true → all 3 canonical categories migrated, full list persisted (factory + discovered), flag flipped ✓
    4. Idempotent re-run → no log, no re-migration ✓

Test plan

  • Wizard flow with mixed provider combinations
  • Migration on legacy config
  • Fresh install with no config.json
  • Idempotent second load
  • Full test suite + lint + typecheck
  • Reviewer: confirm CHANGELOG migration note is clear (recommends mm index --rebuild after upgrade for cleanest index state)

Breaking change notes

  • Users whose old installs relied on auto-discovery of ~/.gemini/ will lose that surface. Documented in CHANGELOG + configuration.md; mm ingest gemini-memory is the replacement.
  • ~/.claude/projects/ previously walked wholesale (transcripts skipped only by supported_extensions, not excluded at walk time); now only the */memory/ subdirs appear in memory_dirs. Index entries for paths no longer in memory_dirs won't be auto-pruned — recommend mm index --rebuild post-upgrade for the cleanest state.
  • indexing.auto_discover field is deprecated; removal scheduled for a future minor.

🤖 Generated with Claude Code

Replace the silent `ensure_auto_discovered_dirs` runtime path with an
explicit `mm init` wizard step (Step 4 of 10: "Provider memory folders")
and a non-interactive `--include-provider {claude-memory,claude-plans,codex}`
flag. Accepted categories land directly in `indexing.memory_dirs`.

Auto-discovery scope is narrowed to each provider's canonical memory
surface (verified against the official documentation):

- Claude Code per-project memory: `~/.claude/projects/<project>/memory/`
  only (not the whole projects tree with session JSONL transcripts and
  staging/). Subdirs without any *.md files are skipped so empty session
  scaffolding doesn't pollute the index.
- Claude Code plans: `~/.claude/plans/`.
- Codex CLI memories: `~/.codex/memories/`.

Gemini CLI is removed from auto-discovery — its memory is the single
file `~/.gemini/GEMINI.md` (incompatible with the directory-based
`memory_dirs` abstraction) and the parent dir contains secrets like
`oauth_creds.json`. `mm ingest gemini-memory` remains as the supported
one-shot path for Gemini users.

`indexing.auto_discover` becomes a one-shot migration trigger. For
existing installs with the flag True AND a config.json on disk, the next
startup appends canonical provider dirs to `memory_dirs`, flips the flag
to False, and persists both atomically. Brand-new installs (no
config.json yet) skip migration entirely; the wizard is the only path
that adds provider dirs.

Manual verification across four paths (fresh install, skip-wizard,
legacy migration, idempotent re-run) passed; 1837 tests green.

Co-Authored-By: Claude <[email protected]>
@memtomem
Copy link
Copy Markdown
Owner Author

리뷰 (Opus 4.7)

전반적으로 단단한 설계입니다. 사일런트 ensure_auto_discovered_dirs → 명시적 wizard opt-in + 1회성 마이그레이션 전환이 깔끔하고, provider dirs 가 explicit memory_dirs 로 옮겨가면서 build_comparand 의 discovery 단계가 사라진 단순화가 좋은 부수 효과입니다.

🔴 Blocker — CI lint 실패

ruff format --check 가 실패 중입니다 (로그).

PR body 는 "ruff check / ruff format --check clean" 으로 적혀 있는데 CI 는 2 파일이 reformat 필요하다고 보고합니다 — tests/test_config_overrides.py, tests/test_init_cmd.py (총 3 군데, json.dumps(...) / json.loads(...) 한 줄 collapse). ruff check 통과 ≠ ruff format --check 통과 (#188/#190 교훈).

uv run ruff format packages/memtomem/tests/test_config_overrides.py packages/memtomem/tests/test_init_cmd.py

로직 변경 없는 mechanical fix 입니다.

코드 품질 — 좋은 점

  • _detect_provider_dirs (grouped, wizard 용) vs _canonical_provider_dirs (flat, migration 용) 분리가 scope rule 중복 없이 두 호출자 요구를 충족.
  • 마이그레이션 idempotency: auto_discover 플래그 + config_path.exists() 이중 게이트, 그리고 완전한 post-migration 리스트를 persist 해서 다음 load 의 REPLACE 시 factory default 가 silently drop 되지 않도록 한 점. config.py:1043-1049 의 inline 설명이 정확히 "왜" 를 알려주는 좋은 주석입니다.
  • memory/ 디렉터리 필터 (any(mem.glob("*.md"))) — Claude 가 방문은 했지만 메모리 안 만든 프로젝트 scaffolding 차단.
  • 신규 install 의 wizard 가 auto_discover: false 를 명시 pin → 같은 머신에서 wizard 재실행해도 마이그레이션이 다시 트리거되지 않음.

마이너 코멘트 (non-blocking)

  1. init_cmd.py:407 — 저장된 memory_dirs 의 path 형식 혼재
    dedup key 는 expanduser() 결과를 쓰지만 append 는 원본 entry 라서, user 의 memory_dir~/notes~/notes 로, provider dir 는 _detect_provider_dirs 가 이미 expand 해놓은 절대경로로 저장됩니다. 로드 자체는 둘 다 정상 동작하니 cosmetic 만 영향 — 의도적이면 그대로 두고 아니면 combined_dirs.append(key) 로 통일.

  2. config.py:1007 docstring "Pre-Z behaviour"
    "Z" 라는 코드네임이 다른 곳에 정의돼 있지 않아 모호합니다. "previous releases" 혹은 "0.1.11 and earlier" 같이 구체적으로.

  3. is_dir() symlink 추적
    ~/.claude/projects/ 안에 사용자가 심볼릭 링크 걸어두면 따라갑니다. 코드베이스 다른 부분과 일관되니 문제는 아니지만 메모.

테스트 커버리지 — 갭 1 개

8 개 신규 테스트가 wizard 분기 / --include-provider / migration matrix / _detect_provider_dirs scope 를 잘 덮습니다.

누락: legacy install 에서 사용자가 명시적으로 설정한 memory_dirs 가 마이그레이션 후에도 보존되는지 직접 검증하는 테스트가 없습니다. test_migration_appends_dirs_and_flips_flag / test_migration_persists_to_config_json 모두 {} base 인데, 가장 흔한 legacy install 형태는 사용자가 memory_dirs 를 일부 설정해둔 상태입니다. _migrate_auto_discover_once 의 persistence-correctness 주석이 보호하려는 시나리오 자체에 대한 직접 테스트가 한 줄 있으면 좋겠습니다:

def test_migration_preserves_existing_user_memory_dirs(...):
    override_path.write_text(
        json.dumps({"indexing": {"memory_dirs": ["/user/explicit/path"]}}),
        encoding="utf-8",
    )
    # assert both /user/explicit/path AND fake provider dir are persisted

문서

configuration.md / getting-started.md (step 9→10 renumber) / reference.md / README / CHANGELOG 모두 업데이트 — feedback_default_change_fanout.md 의 fanout 룰을 잘 따랐습니다. CHANGELOG 의 "Migration notes" 섹션과 mm index --rebuild 권장도 명확합니다.

정리

항목 심각도 액션
ruff format --check CI 실패 Blocker 2 파일 ruff format 실행
기존 memory_dirs preservation 테스트 누락 Medium 테스트 1 개 추가
"Pre-Z" docstring Low 구체적 버전 표기로
Path 형식 혼재 Low (cosmetic) 선택적 정규화

Lint fix + migration preservation 테스트 1 개면 모든 우려 해소됩니다. 나머지는 follow-up 으로 충분.

🤖 Generated with Claude Code

Addresses review feedback on #292:

- ruff format on tests/test_config_overrides.py and tests/test_init_cmd.py
  (CI ran `ruff format --check` against the whole repo including tests/;
  the earlier local check was scoped to src/ only and missed 3 trivial
  json.dumps/loads line-collapse differences in the new tests).
- Add test_migration_preserves_existing_user_memory_dirs — directly
  exercises the ``_persist_auto_discover_migration`` preservation
  invariant for the most common legacy install shape: user-set
  ``memory_dirs`` alongside the default-True ``auto_discover`` flag.
  Asserts both in-memory and persisted state keep the user entry while
  appending the newly-discovered provider dir.
- Replace the vague "Pre-Z behaviour" phrasing in
  ``_migrate_auto_discover_once`` docstring with "Releases 0.1.11 and
  earlier" so future readers don't need to decode the codename.

No behaviour changes — format + test + doc only. 1838 tests green.

Co-Authored-By: Claude <[email protected]>
@memtomem
Copy link
Copy Markdown
Owner Author

리뷰 감사합니다. Blocker + Medium 한 번에 반영했습니다 (718cb39):

Blocker (ruff format): tests/test_config_overrides.py, tests/test_init_cmd.pyuv run ruff format 적용. 원인은 제가 로컬에서 ruff format --check packages/memtomem/src 로 scope 를 src/ 에만 제한해서 tests/ 의 포맷 차이를 놓친 것. CI 는 repo 전체를 돌리니까 드러났고, PR body 에 "ruff format --check clean" 이라 단정적으로 쓴 것도 정확하지 않았습니다 (feedback_ruff_format_check.md + feedback_claim_test_parity.md 교훈 재확인).

Medium (preservation 테스트): test_migration_preserves_existing_user_memory_dirs 추가. 지적하신 대로 legacy install 의 전형적 모양 — 사용자가 memory_dirs 를 이미 설정해둔 상태에서 auto_discover 가 default True 로 남은 경우 — 를 직접 커버합니다. in-memory + persisted state 양쪽에서 user entry + provider dir 둘 다 살아남는 것을 assert. _persist_auto_discover_migration 주석이 지켜주려는 invariant 자체를 테스트하도록.

Low — "Pre-Z" docstring: _migrate_auto_discover_once 의 "Pre-Z behaviour" → "Releases 0.1.11 and earlier" 로 교체. 향후 독자가 코드네임 해석하지 않아도 됨.

Non-blocking — path 형식 혼재: 의도적으로 두겠습니다. user dir 는 사용자가 wizard 에 입력한 원문 (~/notes 유지), provider dir 는 _detect_provider_dirs 가 이미 expand 한 절대경로. 둘 다 Path.expanduser() 통과 후 동일하게 동작하니 cosmetic. 통일하면 user 가 입력한 ~/notes 가 저장 시 절대경로로 바뀌는 부작용이 있음 — 이건 별도 논의가 나을 것 같습니다.

Non-blocking — symlink 추적: 코멘트만 확인. is_dir() 은 codebase 다른 지점과 일관 (_auto_discovered_memory_dirs 원본과 동일) 이라 별도 조치 없음.

로컬 재검증: 1838 tests green, ruff check + format --check + mypy 모두 clean. CI 재확인 부탁드립니다.

@memtomem memtomem merged commit 2e854c7 into main Apr 19, 2026
7 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 19, 2026
@memtomem memtomem deleted the feat/opt-in-provider-dirs branch April 19, 2026 08:01
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants