Skip to content

[Bug]: Diagnostic session state Map grows unbounded without cleanup #5136

@coygeek

Description

@coygeek

Severity: P1/High (Score: 100/150)
CWE: CWE-400 - Uncontrolled Resource Consumption
OWASP: A05:2021 - Security Misconfiguration
File: src/logging/diagnostic.ts:21

Factor Assessment Score
Reachability Every session creates entry 40/40
Impact Memory exhaustion 20/50
Exploitability Sustained usage 10/30
Verification Code confirmed 30/30
Total 100/150

Summary

The sessionStates Map in the diagnostic module accumulates entries for every unique session key but never evicts them. Unlike other caches in the codebase, this Map has no TTL, max size, or cleanup mechanism, causing slow memory leak over the gateway's lifetime.

Steps to reproduce

  1. Run gateway for extended period
  2. Process many unique sessions (different users, conversations)
  3. Monitor memory usage of sessionStates Map
  4. Observe continuous growth without bound

Expected behavior

Diagnostic session state should have TTL-based expiration or explicit cleanup when sessions end.

Actual behavior

Entries are created but never removed:

Affected code location:

Diagnostic Module (src/logging/diagnostic.ts:21):

const sessionStates = new Map<string, SessionState>();

function getSessionState(ref: SessionRef): SessionState {
  const key = resolveSessionKey(ref);
  const existing = sessionStates.get(key);
  if (existing) {
    // ...
    return existing;
  }
  const created: SessionState = {
    sessionId: ref.sessionId,
    sessionKey: ref.sessionKey,
    lastActivity: Date.now(),
    state: "idle",
    queueDepth: 0,
  };
  sessionStates.set(key, created);  // Creates entries that are NEVER removed
  return created;
}

Environment

  • Clawdbot version: latest (main branch)
  • OS: Any
  • Install method: Any (especially affects long-running deployments)

Logs or screenshots

N/A - manifests as memory growth over time

Impact

  • Memory leak: Map grows proportional to total unique sessions ever processed
  • OOM risk: Eventually causes out-of-memory crash
  • Performance degradation: Large Maps slow down operations
  • Affects all sessions: Diagnostic module touches every session

Recommended fix

  1. Add TTL-based expiration:
const SESSION_STATE_TTL_MS = 60 * 60 * 1000;  // 1 hour

setInterval(() => {
  const now = Date.now();
  for (const [key, state] of sessionStates) {
    if (now - state.lastActivity > SESSION_STATE_TTL_MS) {
      sessionStates.delete(key);
    }
  }
}, 60000);  // Cleanup every minute
  1. Or add max size with LRU eviction:
import { LRUCache } from 'lru-cache';

const sessionStates = new LRUCache<string, SessionState>({
  max: 10000,
  ttl: SESSION_STATE_TTL_MS,
});
  1. Add monitoring:
metrics.gauge('diagnostic.sessionStates.size', sessionStates.size);

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions