feat(server): lazy app_lifespan + ensure_initialized owns background loops (#399 phase 3)#411
Merged
feat(server): lazy app_lifespan + ensure_initialized owns background loops (#399 phase 3)#411
Conversation
…oops (#399 phase 3) The final piece of #399: ~/.memtomem/memtomem.db is no longer created on the MCP handshake. app_lifespan now starts/yields without calling ensure_initialized; component construction (storage / embedder / index engine / search pipeline) and the watcher / consolidation scheduler / policy scheduler / health watchdog all move into the first-tool-call path inside ensure_initialized, which now also owns their lifetime. ctx.close() stops anything ensure_initialized started before tearing the components down. Two accepted regressions follow from the flip and are documented for the changelog: 1. The "embedding dimension mismatch detected at startup" warning for #349 degraded mode no longer fires on the lifespan startup stderr — it surfaces inside the first-tool-call response that hits ensure_initialized. mem_embedding_reset / mem_status / mem_stats remain callable so recovery is unchanged. 2. An idle server (no tool calls) runs no background maintenance. Consolidation / policy / health-watchdog all start on the first tool call rather than the lifespan handshake; an MCP client that connects but never calls a tool will not see any scheduler ticks. Lifespan finally is now webhook close -> ctx.close (matches the PR #404 ordering rationale; webhook drops outstanding network state before the slower component teardown). The _teardown_startup_resources helper is gone; its order/idempotency invariants now live on AppContext.close + a new _stop_quietly helper that both ensure_initialized failure cleanup and close share. set_health_watchdog is dropped because there is no more lifespan->ctx hand-off — ctx constructs the watchdog itself. Tests: - test_lazy_init_acceptance.py (new): handshake-only leaves DB absent, first ensure_initialized creates it, concurrent first calls share a single create_components. - test_server_app_context.py: Phase 3 ensure_initialized / close coverage (watcher start under default config, watcher skipped in degraded mode, close stops started watcher, watcher.start failure rolls back components). - test_server_lifespan.py: rewritten around the slim shape — pins that the lifespan does not eagerly init, that webhook closes before ctx, that webhook close failures don't skip ctx.close, that CancelledError propagates, and that an AppContext-construction failure still cleans up the webhook. Co-Authored-By: Claude <[email protected]>
Phase 3 of #399 flips a user-visible behavior — handshake no longer creates ~/.memtomem/memtomem.db — and accepts two follow-on regressions (degraded-mode warning visibility, scheduler-on-idle removal). All three need to be in the changelog before this lands so users running the next release know to expect the new shape. Co-Authored-By: Claude <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Final phase of #399 — flips the user-visible behaviour the prior two
PRs only set up. Stacks on #400 (Phase 1 plumbing) + #410 (Phase 2
handler migration).
What changes
app_lifespanno longer eagerly callsensure_initializedand nolonger constructs the
FileWatcher/ConsolidationScheduler/PolicyScheduler/HealthWatchdogitself. Component constructionand the background-loop start move into
AppContext.ensure_initialized, which now also owns their lifetime—
ctx.close()stops them in reverse-allocation order before tearingthe components down.
Concretely:
optional
WebhookManager→ allocateAppContext→yield. None ofthese touch
~/.memtomem/. On shutdown:webhook.close()first(drops outstanding network state — same rationale as server lifespan: resource leak if watcher/scheduler/policy_scheduler/watchdog startup raises before yield (#400 follow-up) #404), then
ctx.close().AppContext.ensure_initialized: keeps the_init_lock/cached-
Components/ failure-rollback semantics from Phase 1, andadditionally allocates the watcher + (optional) schedulers + (optional)
watchdog. Same degraded-mode gate as before —
embedding_brokenset→ watcher constructed but
start()skipped, schedulers/watchdog noteven built, so MCP server should degrade gracefully on embedding mismatch instead of fail-fast crash #349 recovery via
mem_embedding_resetis unchanged.AppContext.close: stops watchdog → policy → consolidation →watcher → components, each through a new
_stop_quietlyhelper thatlogs+continues on
Exceptionand re-raisesCancelledError(preserves PR server lifespan: resource leak if watcher/scheduler/policy_scheduler/watchdog startup raises before yield (#400 follow-up) #404 / fix(server): tear down startup resources on lifespan failure (closes #404) #406 invariants).
_teardown_startup_resources(its order/idempotencyinvariants now live on
AppContext.close) andset_health_watchdog(lifespan no longer hands a watchdog to ctx — ctx builds it).
User-facing behaviour change
The DB is created on the first tool call (
mem_search,mem_add,mem_status, …) instead of on the MCP handshake.Accepted regressions (please call out in changelog)
"Embedding dimension mismatch detected at startup — entering degraded mode"warning previously fired onmemtomem-serverstderr at boot. With lazy init it fires when
ensure_initializedruns, i.e. on the first tool call. Recoverytools (
mem_embedding_reset,mem_status,mem_stats,mem_list,mem_read) stay callable; the user just learns aboutthe mismatch from the first response rather than the boot stderr.
server (editor opened in evening, no tool calls until morning)
still ran consolidation / policy / health-watchdog every interval.
With lazy init those start on the first tool call. An MCP client
that connects but never calls a tool will not see any scheduler
ticks — consistent with "no DB to maintain" but worth flagging for
anyone whose maintenance schedule assumed background-without-tool-calls.
Both are documented in the issue's open-questions section as accepted
trade-offs.
Acceptance
Mapped from #399's acceptance checklist:
(
test_handshake_only_leaves_db_absent)ensure_initializedcreates the DB(
test_first_ensure_initialized_creates_db)create_componentsinvocation(
test_concurrent_first_calls_invoke_create_components_once+pre-existing
test_ensure_initialized_concurrent_calls_invoke_factory_once)(
test_ensure_initialized_skips_watcher_in_degraded_mode—same gate as the pre-Phase-3 code, just relocated)
ctx.close()stops everythingensure_initializedstarted(
test_close_stops_started_watcher+test_lifespan_closes_webhook_before_ctx)(
test_ensure_initialized_rolls_back_components_when_watcher_start_fails).server.pid+~/.memtomem/mkdir at moduleimport is still eager — out of scope for this PR (open
question 1 in Lazy DB init: defer ~/.memtomem/memtomem.db creation until first tool call #399, ties into
mm uninstallliveness check only sees MCP server pid — silently ignoresmm weband other DB writers #384/memtomem-serverleaves stale.server.pidon exit — risksmm uninstallliveness false-positive via PID recycling #387$XDG_RUNTIME_DIRrelocation)
Smoke test
Test plan
uv run ruff check packages/memtomem/src && uv run ruff format --check packages/memtomem/src— cleanuv run mypy packages/memtomem/src/memtomem/server/{context,lifespan}.py— clean (advisory)uv run pytest -m "not ollama"— 2177 passed, 46 deselectedclaude mcp add+claude mcp listagainst builtbranch in fresh
~/.memtomem(reviewer: please confirm DBstays absent on your machine too)
🤖 Generated with Claude Code