feat(server): lazy app_lifespan + ensure_initialized owns background loops (#399 phase 3) by memtomem · Pull Request #411 · memtomem/memtomem

memtomem · 2026-04-23T05:58:51Z

Final phase of #399 — flips the user-visible behaviour the prior two
PRs only set up. Stacks on #400 (Phase 1 plumbing) + #410 (Phase 2
handler migration).

What changes

app_lifespan no longer eagerly calls ensure_initialized and no
longer constructs the FileWatcher / ConsolidationScheduler /
PolicyScheduler / HealthWatchdog itself. Component construction
and the background-loop start move into
AppContext.ensure_initialized, which now also owns their lifetime
— ctx.close() stops them in reverse-allocation order before tearing
the components down.

Concretely:

Lifespan (≈40 lines smaller): load env → set up logging → build
optional WebhookManager → allocate AppContext → yield. None of
these touch ~/.memtomem/. On shutdown: webhook.close() first
(drops outstanding network state — same rationale as server lifespan: resource leak if watcher/scheduler/policy_scheduler/watchdog startup raises before yield (#400 follow-up) #404), then
ctx.close().
AppContext.ensure_initialized: keeps the _init_lock /
cached-Components / failure-rollback semantics from Phase 1, and
additionally allocates the watcher + (optional) schedulers + (optional)
watchdog. Same degraded-mode gate as before — embedding_broken set
→ watcher constructed but start() skipped, schedulers/watchdog not
even built, so MCP server should degrade gracefully on embedding mismatch instead of fail-fast crash #349 recovery via mem_embedding_reset is unchanged.
AppContext.close: stops watchdog → policy → consolidation →
watcher → components, each through a new _stop_quietly helper that
logs+continues on Exception and re-raises CancelledError
(preserves PR server lifespan: resource leak if watcher/scheduler/policy_scheduler/watchdog startup raises before yield (#400 follow-up) #404 / fix(server): tear down startup resources on lifespan failure (closes #404) #406 invariants).
Removed: _teardown_startup_resources (its order/idempotency
invariants now live on AppContext.close) and set_health_watchdog
(lifespan no longer hands a watchdog to ctx — ctx builds it).

User-facing behaviour change

rm -rf ~/.memtomem
claude mcp add memtomem -s user -- uvx --from memtomem memtomem-server
claude mcp list                # before #399: ~/.memtomem/memtomem.db appears
                               # after #399 phase 3: directory stays absent

The DB is created on the first tool call (mem_search, mem_add,
mem_status, …) instead of on the MCP handshake.

Accepted regressions (please call out in changelog)

MCP server should degrade gracefully on embedding mismatch instead of fail-fast crash #349 startup-warning visibility moves to first tool call. The
"Embedding dimension mismatch detected at startup — entering degraded mode" warning previously fired on memtomem-server
stderr at boot. With lazy init it fires when
ensure_initialized runs, i.e. on the first tool call. Recovery
tools (mem_embedding_reset, mem_status, mem_stats,
mem_list, mem_read) stay callable; the user just learns about
the mismatch from the first response rather than the boot stderr.
Background scheduler-on-idle removed. Previously, an idle
server (editor opened in evening, no tool calls until morning)
still ran consolidation / policy / health-watchdog every interval.
With lazy init those start on the first tool call. An MCP client
that connects but never calls a tool will not see any scheduler
ticks — consistent with "no DB to maintain" but worth flagging for
anyone whose maintenance schedule assumed background-without-tool-calls.

Both are documented in the issue's open-questions section as accepted
trade-offs.

Acceptance

Mapped from #399's acceptance checklist:

handshake-only lifespan leaves the DB absent
(test_handshake_only_leaves_db_absent)
first ensure_initialized creates the DB
(test_first_ensure_initialized_creates_db)
concurrent first-callers share a single
create_components invocation
(test_concurrent_first_calls_invoke_create_components_once +
pre-existing test_ensure_initialized_concurrent_calls_invoke_factory_once)
watcher/scheduler/watchdog skipped in degraded mode
(test_ensure_initialized_skips_watcher_in_degraded_mode —
same gate as the pre-Phase-3 code, just relocated)
ctx.close() stops everything
ensure_initialized started
(test_close_stops_started_watcher +
test_lifespan_closes_webhook_before_ctx)
watcher-start failure rolls back components
(test_ensure_initialized_rolls_back_components_when_watcher_start_fails)
Background .server.pid + ~/.memtomem/ mkdir at module
import is still eager — out of scope for this PR (open
question 1 in Lazy DB init: defer ~/.memtomem/memtomem.db creation until first tool call #399, ties into mm uninstall liveness check only sees MCP server pid — silently ignores mm web and other DB writers #384/memtomem-server leaves stale .server.pid on exit — risks mm uninstall liveness false-positive via PID recycling #387 $XDG_RUNTIME_DIR
relocation)

Smoke test

$ HOME=/tmp/mm-phase3-smoke MEMTOMEM_INDEXING__MEMORY_DIRS='[]' \
  MEMTOMEM_EMBEDDING__PROVIDER=none uv run python -c "..."
Pre-lifespan: db exists=False
In lifespan (no tool call): db exists=False components=None
-- now calling ensure_initialized --
After ensure_initialized: db exists=True components is not None=True
Post-lifespan: db exists=True

Test plan

uv run ruff check packages/memtomem/src && uv run ruff format --check packages/memtomem/src — clean
uv run mypy packages/memtomem/src/memtomem/server/{context,lifespan}.py — clean (advisory)
uv run pytest -m "not ollama" — 2177 passed, 46 deselected
Manual claude mcp add + claude mcp list against built
branch in fresh ~/.memtomem (reviewer: please confirm DB
stays absent on your machine too)

🤖 Generated with Claude Code

…oops (#399 phase 3) The final piece of #399: ~/.memtomem/memtomem.db is no longer created on the MCP handshake. app_lifespan now starts/yields without calling ensure_initialized; component construction (storage / embedder / index engine / search pipeline) and the watcher / consolidation scheduler / policy scheduler / health watchdog all move into the first-tool-call path inside ensure_initialized, which now also owns their lifetime. ctx.close() stops anything ensure_initialized started before tearing the components down. Two accepted regressions follow from the flip and are documented for the changelog: 1. The "embedding dimension mismatch detected at startup" warning for #349 degraded mode no longer fires on the lifespan startup stderr — it surfaces inside the first-tool-call response that hits ensure_initialized. mem_embedding_reset / mem_status / mem_stats remain callable so recovery is unchanged. 2. An idle server (no tool calls) runs no background maintenance. Consolidation / policy / health-watchdog all start on the first tool call rather than the lifespan handshake; an MCP client that connects but never calls a tool will not see any scheduler ticks. Lifespan finally is now webhook close -> ctx.close (matches the PR #404 ordering rationale; webhook drops outstanding network state before the slower component teardown). The _teardown_startup_resources helper is gone; its order/idempotency invariants now live on AppContext.close + a new _stop_quietly helper that both ensure_initialized failure cleanup and close share. set_health_watchdog is dropped because there is no more lifespan->ctx hand-off — ctx constructs the watchdog itself. Tests: - test_lazy_init_acceptance.py (new): handshake-only leaves DB absent, first ensure_initialized creates it, concurrent first calls share a single create_components. - test_server_app_context.py: Phase 3 ensure_initialized / close coverage (watcher start under default config, watcher skipped in degraded mode, close stops started watcher, watcher.start failure rolls back components). - test_server_lifespan.py: rewritten around the slim shape — pins that the lifespan does not eagerly init, that webhook closes before ctx, that webhook close failures don't skip ctx.close, that CancelledError propagates, and that an AppContext-construction failure still cleans up the webhook. Co-Authored-By: Claude <[email protected]>

Phase 3 of #399 flips a user-visible behavior — handshake no longer creates ~/.memtomem/memtomem.db — and accepts two follow-on regressions (degraded-mode warning visibility, scheduler-on-idle removal). All three need to be in the changelog before this lands so users running the next release know to expect the new shape. Co-Authored-By: Claude <[email protected]>

pandas-studio and others added 2 commits April 23, 2026 14:57

memtomem merged commit 00e948b into main Apr 23, 2026
7 checks passed

github-actions Bot locked and limited conversation to collaborators Apr 23, 2026

memtomem deleted the feat/lazy-init-phase3 branch April 23, 2026 06:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): lazy app_lifespan + ensure_initialized owns background loops (#399 phase 3)#411

feat(server): lazy app_lifespan + ensure_initialized owns background loops (#399 phase 3)#411
memtomem merged 2 commits intomainfrom
feat/lazy-init-phase3

memtomem commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

memtomem commented Apr 23, 2026

What changes

User-facing behaviour change

Accepted regressions (please call out in changelog)

Acceptance

Smoke test

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants