fix: close previous SessionDB before replacing on cached agent by wali-reheman · Pull Request #1421 · nesquena/hermes-webui

wali-reheman · 2026-05-01T20:51:58Z

Problem

SessionDB WAL handles leak when _run_agent_streaming creates a new SessionDB instance per request and replaces the cached agent's _session_db without closing the old one.

Each orphaned SessionDB connection holds 2 FDs (state.db + state.db-wal). After ~73 messages, leaked handles exhaust the 256 FD default limit causing EMFILE crashes.

Root cause

In api/streaming.py, when reusing a cached agent:

agent._session_db = _session_db  # old _session_db never closed

Fix

Close the previous _session_db before replacing it on cached agents:

if hasattr(agent, '_session_db') and agent._session_db is not None:
    try:
        agent._session_db.close()
    except Exception:
        pass
agent._session_db = _session_db

Testing

Fresh server after fix: 2 DB handles (.db + .db-wal + .db-shm) vs 73 handles before
EMFILE crash at 16:28 ET reproduced the 73-handle leak exactly
Server has been stable since restart

Closes #streaming FD leak.

SessionDB WAL handles leak when streaming.py creates a new SessionDB instance per request and replaces the cached agent's _session_db without closing the old one. Each orphaned connection holds 2 FDs (.db + .db-wal), causing FD exhaustion and EMFILE crashes after ~73 messages. Fix: close the previous _session_db before replacing it on cached agents, mirroring the close-before-replace pattern used elsewhere in the codebase.

nesquena-hermes · 2026-05-01T21:54:26Z

Thanks @wali-reheman — this is a good catch.

Confirmed the leak: in api/streaming.py ~L1885, the cached-agent reuse path overwrites agent._session_db without closing the previous instance. Each SessionDB.__init__ opens a SQLite connection that holds three FDs once WAL kicks in (state.db, state.db-wal, state.db-shm), and they're never released because nothing else holds a reference to the old object once agent._session_db is rebound. After enough chat turns on a long-lived agent, EMFILE is the predictable end state.

Your fix shape matches what SessionDB.close() is designed for — it grabs the internal lock, runs PRAGMA wal_checkpoint(PASSIVE), then closes the connection and nulls _conn. Wrapping it in try/except Exception: pass is right: a busted WAL checkpoint shouldn't block the new session from being installed.

Two things I'll verify on my side before merging:

No double-close races. Want to make sure no other code path (cancel handler, finally block in the streaming generator, agent destructor) closes the same _session_db after we've already swapped it. SessionDB.close() is idempotent (if self._conn:), so worst case is benign, but I'd like to confirm the swap point is the only owner.
FD-count regression test. Going to add a small test that drives N streaming requests through the cached-agent reuse path and asserts the open-FD count for state.db* stays bounded.

Will run the full pytest suite (3604 tests on master) plus the browser harness today and aim to merge in the next batch release.

Repro detail you mentioned — 73 handles at the EMFILE crash matching exactly the message count — is solid evidence and worth pinning in the test. Thanks again.

…fore replacing on cached agent

…tion + CHANGELOG + 5 regression tests PR nesquena#1421 (SessionDB WAL handle leak fix on cached-agent reuse path) had a sibling leak at the LRU eviction site that I caught during pre-review: api/streaming.py SESSION_AGENT_CACHE.popitem(last=False) was discarding the evicted entry with `evicted_sid, _ = ...`. The agent's _session_db was dropped on the floor and only released when GC eventually finalized the agent — which on a long-running server may be never (cyclic refs, extension types holding C handles, etc.). Same fix shape as nesquena#1421: capture the evicted entry, call _evicted_agent._session_db.close() explicitly. SessionDB.close() is idempotent + thread-safe (with self._lock: if self._conn:), so the double-close-is-benign property still holds. 5 regression tests in test_v050259_sessiondb_fd_leak.py: - Source-level: cached-agent reuse path closes before replace - Source-level: LRU eviction path captures + closes evicted agent - Behavioral: SessionDB.close() is idempotent (3 calls safe) - Behavioral: cached-agent reuse with mock — close called exactly once - Behavioral: LRU eviction with mock — only evicted agent's DB closes Full suite: 3615 passed, 0 failed. Nathan explicitly authorized 'just go ahead and merge it as a small release' since the PR is 9 LOC, focused, has Opus pre-release follow-up + tests, and matches the empirically-confirmed leak shape (73-handle leak at EMFILE).

wali-reheman closed this May 1, 2026

wali-reheman reopened this May 1, 2026

nesquena-hermes mentioned this pull request May 1, 2026

v0.50.259 — SessionDB FD-leak hotfix (#1421) + LRU-eviction Opus follow-up #1427

Merged

nesquena-hermes closed this pull request by merging all changes into nesquena:master in c0d50b3 May 1, 2026

pull Bot pushed a commit to JamesWilliam1977/hermes-webui that referenced this pull request May 1, 2026

Merge PR nesquena#1421 from wali-reheman: close previous SessionDB be…

f058932

…fore replacing on cached agent

This was referenced May 2, 2026

bug(stability): persistent-host crashes — bootstrap.py double-fork breaks launchd + state.db FD leak #1458

Closed

fix(sqlite): close state.db connections explicitly to stop FD leak in sidebar polling (#1494) #1495

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: close previous SessionDB before replacing on cached agent#1421

fix: close previous SessionDB before replacing on cached agent#1421
1 commit merged intonesquena:masterfrom
wali-reheman:fix/sessiondb-wal-handle-leak

wali-reheman commented May 1, 2026

Uh oh!

nesquena-hermes commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wali-reheman commented May 1, 2026

Problem

Root cause

Fix

Testing

Uh oh!

nesquena-hermes commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants