Lazy DB init: defer ~/.memtomem/memtomem.db creation until first tool call

Continuation of the docs-only fix in #381 — addressing the broader behavior the comment thread on that PR identified.

## Problem

Any MCP client connecting to memtomem — not just the `memtomem-server` verify command — creates `~/.memtomem/memtomem.db` on handshake. #381 addressed only the docs verify command; the broader vector is every client (Claude Code, Cursor, Windsurf, Gemini CLI, any MCP session) that connects to memtomem after registration instantiates the DB before the user runs `mm init`.

Reproduction (observed 2026-04-23, memtomem==0.1.23):

```bash
rm -rf ~/.memtomem
claude mcp add memtomem -s user -- uvx --from memtomem memtomem-server
ls ~/.memtomem                  # still absent
claude mcp list                 # → ~/.memtomem suddenly populated
```

Observed: the directory appeared at the moment of `claude mcp list`. The exact spawner is not yet pinpointed — `claude mcp list` itself does a health-check spawn, and a concurrent Claude Code session may also rescan and reconnect when registration lands in `~/.claude.json`. Both paths execute the same `memtomem-server` startup.

## Root cause

Startup eagerly creates state at two layers:

1. **`server/__init__.py`** — line 149 `pid_dir.mkdir(...)` creates `~/.memtomem/`; lines 150–173 open + `flock` `.server.pid` (advisory lock). Runs on every `memtomem-server` spawn, including short-lived health-check spawns.
2. **`server/lifespan.py:105`** — `app_lifespan` calls `create_components()` which runs `await storage.initialize()` (`server/component_factory.py:66`), creating `~/.memtomem/memtomem.db` with all schemas, **before** the MCP `initialize` handshake yields.

MCP `initialize` + `tools/list` + `resources/list` do not need the DB — tool metadata is bound at import time in `server/__init__.py:42-105` (decorators) and filtered at lines 127–139, both lazy-independent. Only tool handlers and resource handlers actually touch the DB (`mem_add` / `mem_index` / `mem_search` / `mem_recall` / `mem_list` / `mem_read` and the `memtomem://*` resources in `server/resources.py:12-78`, all `app.storage.*` calls).

Note on config read paths (out of scope for lazy init): `Mem2MemConfig()` + `load_config_d()` + `load_config_overrides()` read `~/.memtomem/config.json` and `~/.memtomem/config.d/*.json`, but both are read-only and no-op when absent (`config.py:787`, `config.py:947`). They don't write or create the directory.

## Proposal

Defer `create_components()` from lifespan startup to first tool call that needs it. Sketch:

- **`AppContext`** gains `_components: Components | None`, `_init_lock: asyncio.Lock`, `async ensure_initialized() -> Components`.
- **`app_lifespan`** loads config, sets up logging, creates webhook manager (storage-free), yields. No `create_components`, no watcher / scheduler / watchdog start.
- **Tool handlers and resource handlers** call `await app.ensure_initialized()` before touching `storage` / `embedder` / `index_engine` / `search_pipeline`.
- **First-call init** runs `create_components`, then starts watcher / consolidation scheduler / policy scheduler / health watchdog — the tail of today's `app_lifespan` moves inside `ensure_initialized`.
- **`embedding_broken`** moves from `AppContext` field to a property reading `_components.embedding_broken` post-init; gate helpers like `_check_embedding_mismatch` need to ensure-init before checking (or be called from already-init'd handlers).
- **Shutdown** in `app_lifespan.finally` inspects `_components`; if still `None`, only webhook cleanup runs.

## Implementation staging

This is not a single-file change. Suggested PR breakdown (or commit ladder if single PR):

1. **Plumbing**: `AppContext._components`/`_init_lock`/`ensure_initialized` + `_get_app_initialized` helper. No behavior change yet (call `ensure_initialized` from lifespan startup so existing tests pass). Tests for the lock semantics + concurrent first-call.
2. **Handler migration**: every `@mcp.tool` and `@mcp.resource` and `@register` action handler — ~30+ files under `server/tools/` and `server/resources.py` — switches from `_get_app` to `_get_app_initialized` (or inserts `await app.ensure_initialized()`). Same-PR audit: `embedding_broken` reads in tools migrate to property accessor.
3. **Lifespan slimming**: remove `create_components` + watcher / scheduler / watchdog startup from `app_lifespan`; move into `ensure_initialized`. Now `~/.memtomem/memtomem.db` is no longer created on handshake.
4. **Tests**: fresh-state acceptance tests (DB absent after handshake), concurrent first-call race, in-process MCP client doing handshake-only.

## Open questions / accepted regressions

1. **`main()` `.server.pid` + mkdir stays eager.** Advisory lock needs early acquisition. Leaves `~/.memtomem/` present after a `claude mcp list` spawn (empty after atexit unlink at `server/__init__.py:173`). Follow-up: relocate `.server.pid` to `$XDG_RUNTIME_DIR` (ties into #384 / #387).
2. **Degraded mode (#349) startup-warning visibility regression.** Currently `_log.warning("Embedding dimension mismatch detected at startup ...")` (`component_factory.py:78-84`) fires when the server boots, so users see it in stderr immediately. Under lazy init the warning fires on first tool call, and the user only learns of the mismatch via the actionable error in that tool's response. Recovery tools (`mem_embedding_reset`, `mem_status`, `mem_stats`) remain callable. **Accepted regression** — document in changelog.
3. **Background scheduler/watcher start delay is a real regression, not "near-zero."** Today, an idle server (e.g. editor opened in evening, no tool calls until morning) still runs consolidation / policy / health-watchdog in the background. Under lazy init, schedulers don't start until first tool call. Two paths:
   - **(a) Decoupled scheduler startup**: spin schedulers as a separate lifespan task that themselves call `ensure_initialized` — but then schedulers immediately trigger DB creation, defeating the goal.
   - **(b) Accept the regression**: schedulers start on first tool call. An idle server with zero tool calls does no maintenance — consistent with "no DB to maintain."

   (b) is simpler and consistent. **Accepted regression** — document in changelog; if anyone needs background-without-tool-calls semantics later, that's a separate feature.
4. **Concurrent first-call race.** Two tools arriving simultaneously both see `_components is None`. `_init_lock` serializes. Tricky path: degraded-mode storage swap (`component_factory.py:85-99`) on second-storage-open failure — needs to be exercised in concurrent test.
5. **`mem_status` on uninitialized state.** Simplest: `ensure_initialized` triggers init. Alternative (return "no state yet" without creating DB) makes status and real tools disagree. Pick the first.
6. **Lazy-init failure observability.** When `ensure_initialized` raises (e.g. DB permission error, embedding provider import failure), the failure surfaces in the requesting tool's response. Format: stderr log via existing `logger.error(...)` + structured tool error `{"error": "initialization_failed", "reason": "<message>"}` so MCP clients can render. Record this in `docs/troubleshoot.md`.

## Acceptance

- [ ] `rm -rf ~/.memtomem && claude mcp add memtomem -s user -- uvx --from memtomem memtomem-server && sleep 2 && ls ~/.memtomem/memtomem.db` → file not found.
- [ ] In-process MCP client integration test: handshake (`initialize`) + `tools/list` + `resources/list` + shutdown leaves `~/.memtomem/memtomem.db` absent.
- [ ] In-process MCP client integration test: handshake + `ping` (if supported) + shutdown leaves DB absent.
- [ ] First `mem_search` (or any handler-bearing tool) on fresh state creates the DB and returns a result.
- [ ] First `memtomem://sources` resource fetch on fresh state creates the DB.
- [ ] Concurrent first-call from two tools → single `create_components` invocation, both responses succeed.
- [ ] `mem_embedding_reset` callable on legacy-DB fresh state with `config.embedding.provider != none` (current #349 scenario still recoverable, just discovered on first call instead of startup).
- [ ] Changelog entry documenting (a) startup-warning visibility moves to first tool call for #349 mismatch, (b) background schedulers start on first tool call rather than handshake.

## Related

- #381 — docs verify command (merged in `docs/drop-misleading-verify-command`); this is the broader behavioral fix the comment thread identified.
- #384 / #387 — `.server.pid` stale after abnormal exit (separate follow-up; ties into open question 1).
- #349 — degraded mode on embedding mismatch; affects open question 2.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy DB init: defer ~/.memtomem/memtomem.db creation until first tool call #399

Problem

Root cause

Proposal

Implementation staging

Open questions / accepted regressions

Acceptance

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Lazy DB init: defer ~/.memtomem/memtomem.db creation until first tool call #399

Description

Problem

Root cause

Proposal

Implementation staging

Open questions / accepted regressions

Acceptance

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions