Skip to content

[Bug]: Gateway agentRunSeq Map Never Pruned Causes Memory Exhaustion #6036

@coygeek

Description

@coygeek

CVSS Assessment

Metric Value
Score 6.5 / 10.0
Severity Medium
Vector CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

CVSS v3.1 Calculator

Summary

The gateway server maintains an agentRunSeq Map to track sequence numbers for agent runs, but this Map is never pruned. Unlike other Maps in the gateway context (dedupe, chatAbortControllers, chatRunState.abortedRuns) which are cleaned up in the maintenance loop, agentRunSeq accumulates entries indefinitely, causing slow memory exhaustion on long-running gateway servers.

Affected Code

File: src/gateway/server-runtime-state.ts:155

const agentRunSeq = new Map<string, number>();

File: src/gateway/server-chat.ts:246-264

const last = agentRunSeq.get(evt.runId) ?? 0;
if (evt.seq <= last) {
  // skip duplicate
}
agentRunSeq.set(evt.runId, evt.seq);  // Entry added, never removed

File: src/gateway/server-maintenance.ts:75-117

// Maintenance loop prunes other maps but NOT agentRunSeq
const dedupeCleanup = setInterval(() => {
  // Prunes: dedupe, chatAbortControllers, chatRunState.abortedRuns
  // MISSING: agentRunSeq is never cleaned up
}, 60_000);

Attack Surface

How is this reached?

  • Network (HTTP/WebSocket endpoint, API call)
  • Adjacent Network (same LAN, requires network proximity)
  • Local (local file, CLI argument, environment variable)
  • Physical (requires physical access to machine)

Authentication required?

  • None (unauthenticated/public access)
  • Low (any authenticated user)
  • High (admin/privileged user only)

Entry point: Any authenticated gateway client initiating agent runs via WebSocket connection

Exploit Conditions

Complexity:

  • Low (no special conditions, works reliably)
  • High (requires race condition, specific config, or timing)

User interaction:

  • None (automatic, no victim action needed)
  • Required (victim must click, visit, or perform action)

Prerequisites: Gateway server running with authenticated clients making agent requests over time

Impact Assessment

Scope:

  • Unchanged (impact limited to vulnerable component)
  • Changed (can affect other components, escape sandbox)

What can an attacker do?

Impact Type Level Description
Confidentiality None No data disclosure
Integrity None No data modification
Availability High Gateway process memory exhaustion leading to OOM crash or degraded performance

Steps to Reproduce

  1. Start a gateway server with persistent uptime
  2. Make repeated agent runs from authenticated clients over days/weeks
  3. Monitor memory usage of the gateway process
  4. Observe that memory grows linearly with total unique runId values ever seen
  5. Eventually gateway becomes unresponsive or crashes due to memory exhaustion

Recommended Fix

Add agentRunSeq to the maintenance cleanup loop in server-maintenance.ts:

// In the dedupeCleanup interval, add:
const agentRunSeqCleanup = setInterval(() => {
  const maxAge = 60 * 60 * 1000; // 1 hour
  const now = Date.now();
  // Either track timestamps with entries, or clear periodically
  // Option 1: Clear entire map on interval (loses sequence tracking for old runs)
  if (agentRunSeq.size > 10000) {
    agentRunSeq.clear();
  }
  // Option 2: Track timestamps and prune old entries
}, 60_000);

References

  • CWE: CWE-770 - Allocation of Resources Without Limits or Throttling
  • Related: Similar to other unbounded cache issues in the codebase (presence cache, sticker cache, rate limit map)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions