Skip to content

server lifespan: resource leak if watcher/scheduler/policy_scheduler/watchdog startup raises before yield (#400 follow-up) #404

@memtomem

Description

@memtomem

Follow-up to #400 (phase 1 of #399).

Problem

#400 added a try/except close_components around ensure_initialized() in AppContext so that if a post-factory step raises, the already-created Components get torn down cleanly. This is covered by test_ensure_initialized_closes_components_if_post_factory_step_raises.

However, lifespan.py creates several other startup resources around the ensure_initialized call:

  • webhook_mgr (before ensure_initialized)
  • watcher, scheduler, policy_scheduler, watchdog (after ensure_initialized)

If any of these raises between AppContext.__init__ and yield, the finally/teardown block doesn't run (we never reached yield) and any resources already allocated at that point leak. Specifically:

  • webhook_mgr raise → _components never allocated; no leak, but webhook_mgr itself may have opened network state.
  • ensure_initialized raise → covered by feat(server): add AppContext.ensure_initialized plumbing (#399 phase 1) #400.
  • watcher/scheduler/policy_scheduler/watchdog raise → _components is allocated (ensure_initialized succeeded) and not closed because the startup block exits via exception before the teardown block runs.

Fix options

Option A — wrap the entire startup block (from first allocation through yield) in a single try/except that tears down in reverse order on failure.

Option B — nested try/finally per resource so each one rolls back only what was allocated so far.

Option A is simpler but couples teardown code paths; Option B is more surgical but verbose.

Timing

Blocker for Phase 2/3 of #399 — once lazy init moves the DB open to a handler, a late allocation failure is more likely than in the current eager-init world, and this leak becomes user-visible (e.g. stale file locks on ~/.memtomem/memtomem.db).

Not a blocker for #400 merge itself (Phase 1 is eager-init, so a failure here crashes the process before any request handling, and the OS reclaims fd/flock on process exit).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions