Skip to content

[Bug]: Multiple in-memory caches grow unbounded, risking memory exhaustion #4948

@coygeek

Description

@coygeek

Summary

Severity: P1/High (Score: 85/150)
CWE: CWE-400 - Uncontrolled Resource Consumption
OWASP: A05:2021 - Security Misconfiguration
File: Multiple locations (see table below)

Several in-memory caches and maps in the codebase lack size limits or TTL-based eviction, which can lead to memory exhaustion in long-running gateway processes under heavy load.

Why this is critical: Memory leaks from unbounded caches are insidious—they don't crash immediately but degrade over time. In production deployments serving many users, each unique account/room/user adds entries that never expire. After days or weeks of operation, the gateway OOMs without warning. This is especially problematic in containerized deployments where memory limits trigger sudden kills, and in shared hosting where one gateway's leak affects others.

Triage Assessment

Factor Value Score
Reachability Normal usage triggers growth 30/40
Impact Memory exhaustion, OOM crash 20/50
Exploitability Passive over time, or active via many unique IDs 20/30
Verification Multiple file:line cited, code confirmed 15/30
Total 85/150

Steps to reproduce

  1. Run the gateway process for an extended period
  2. Process a large number of unique accounts/rooms/servers
  3. Monitor memory usage over time
  4. Observe memory continuously growing without bound

Expected behavior

Caches should have:

  • Maximum size limits with LRU eviction
  • TTL-based expiration for stale entries
  • Periodic cleanup of unused entries

Actual behavior

Multiple caches grow without limits:

Affected caches:

Cache Location Risk
serverInfoCache extensions/bluebubbles/src/probe.ts:20 Has TTL but no max size - grows with unique accountIds
authCache extensions/googlechat/src/auth.ts:12 No TTL, no max size - grows with unique accountIds
directRoomCache extensions/matrix/src/matrix/send/targets.ts:19 No TTL, no max size - grows with unique user IDs
memberCountCache extensions/matrix/src/matrix/monitor/direct.ts:22 Has TTL for freshness but no max size
rateLimitMap extensions/nostr/src/nostr-profile-http.ts:45 TTL per-entry but no removal of inactive entries
presenceCache src/discord/monitor/presence-cache.ts No TTL, no max size - grows with unique (account, user) pairs
Client manager registry extensions/twitch/src/client-manager-registry.ts:29 Has cleanup functions but no automatic eviction

Positive example (bounded cache):

// This IS properly bounded (src/infra/dedupe.ts):
while (cache.size > maxSize) {
  const oldestKey = cache.keys().next().value;
  if (!oldestKey) break;
  cache.delete(oldestKey);
}

Environment

  • Version: latest (main branch)
  • OS: Any
  • Install method: Any (especially affects long-running docker deployments)

Impact

  • Memory exhaustion: In production, gateway process can OOM after extended operation
  • Performance degradation: Large maps slow down lookups
  • Unpredictable crashes: OOM kills are hard to debug without memory profiling
  • Rate limit bypass: In-memory rate limiting doesn't survive restarts

Recommended fix

  1. Add maximum size limits to all caches with LRU eviction:
import { LRUCache } from 'lru-cache';

const serverInfoCache = new LRUCache<string, ServerInfo>({
  max: 10000,
  ttl: 1000 * 60 * 60, // 1 hour TTL
});
  1. For rate limiting specifically, consider:
  • Using Redis for distributed rate limiting
  • Adding periodic cleanup of expired entries
  • Documenting that in-memory limits don't survive restarts
  1. Add monitoring for cache sizes:
metrics.gauge('cache.serverInfo.size', serverInfoCache.size);

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions