fix: close previous SessionDB before replacing on cached agent#1421
fix: close previous SessionDB before replacing on cached agent#14211 commit merged intonesquena:masterfrom
Conversation
SessionDB WAL handles leak when streaming.py creates a new SessionDB instance per request and replaces the cached agent's _session_db without closing the old one. Each orphaned connection holds 2 FDs (.db + .db-wal), causing FD exhaustion and EMFILE crashes after ~73 messages. Fix: close the previous _session_db before replacing it on cached agents, mirroring the close-before-replace pattern used elsewhere in the codebase.
|
Thanks @wali-reheman — this is a good catch. Confirmed the leak: in Your fix shape matches what Two things I'll verify on my side before merging:
Will run the full pytest suite (3604 tests on master) plus the browser harness today and aim to merge in the next batch release. Repro detail you mentioned — 73 handles at the EMFILE crash matching exactly the message count — is solid evidence and worth pinning in the test. Thanks again. |
c0d50b3
…fore replacing on cached agent
…tion + CHANGELOG + 5 regression tests PR nesquena#1421 (SessionDB WAL handle leak fix on cached-agent reuse path) had a sibling leak at the LRU eviction site that I caught during pre-review: api/streaming.py SESSION_AGENT_CACHE.popitem(last=False) was discarding the evicted entry with `evicted_sid, _ = ...`. The agent's _session_db was dropped on the floor and only released when GC eventually finalized the agent — which on a long-running server may be never (cyclic refs, extension types holding C handles, etc.). Same fix shape as nesquena#1421: capture the evicted entry, call _evicted_agent._session_db.close() explicitly. SessionDB.close() is idempotent + thread-safe (with self._lock: if self._conn:), so the double-close-is-benign property still holds. 5 regression tests in test_v050259_sessiondb_fd_leak.py: - Source-level: cached-agent reuse path closes before replace - Source-level: LRU eviction path captures + closes evicted agent - Behavioral: SessionDB.close() is idempotent (3 calls safe) - Behavioral: cached-agent reuse with mock — close called exactly once - Behavioral: LRU eviction with mock — only evicted agent's DB closes Full suite: 3615 passed, 0 failed. Nathan explicitly authorized 'just go ahead and merge it as a small release' since the PR is 9 LOC, focused, has Opus pre-release follow-up + tests, and matches the empirically-confirmed leak shape (73-handle leak at EMFILE).
Problem
SessionDB WAL handles leak when
_run_agent_streamingcreates a newSessionDBinstance per request and replaces the cached agent's_session_dbwithout closing the old one.Each orphaned
SessionDBconnection holds 2 FDs (state.db + state.db-wal). After ~73 messages, leaked handles exhaust the 256 FD default limit causingEMFILEcrashes.Root cause
In
api/streaming.py, when reusing a cached agent:Fix
Close the previous
_session_dbbefore replacing it on cached agents:Testing
Closes #streaming FD leak.