Skip to content

fix(infra): lazy prune in DedupeCache to resolve Discord 33-122s message delays#52762

Open
Erprabhat8423 wants to merge 2 commits intoopenclaw:mainfrom
Erprabhat8423:fix/discord-message-listener-perf
Open

fix(infra): lazy prune in DedupeCache to resolve Discord 33-122s message delays#52762
Erprabhat8423 wants to merge 2 commits intoopenclaw:mainfrom
Erprabhat8423:fix/discord-message-listener-perf

Conversation

@Erprabhat8423
Copy link
Copy Markdown

Fixes #4453

Root Cause

The dedupe fix (#51950) added recentInboundMessages.check() to the
Discord message handler. check() called prune() on every message,
which did a full linear scan of up to 5000 cache entries each time.

On active servers this blocked the event loop cumulatively, producing
the exact 33-122s delays reported in the issue.

Changes

  • src/infra/dedupe.ts — only prune when cache.size > maxSize
  • src/infra/map-size.ts — fixed corrupted duplicate code + rewrote
    pruneMapToMaxSize to delete in a single forward pass

Testing

  • src/infra/dedupe.test.ts done (5 tests)
  • src/infra/map-size.test.ts done (2 tests)
  • extensions/discord/src/monitor/message-handler.queue.test.ts done (11 tests)

The dedupe fix in openclaw#51950 introduced recentInboundMessages.check() in
the Discord message handler. This called prune() on every single
message, which did a full linear scan of up to 5000 cache entries on
each invocation.

On active Discord servers, this blocked the event loop for tens of
milliseconds per message, stacking up to the 33-122s delays reported
in openclaw#4453.

Also fixed map-size.ts which had corrupted duplicate code.

Changes:
- dedupe.ts: only call prune() when cache.size exceeds maxSize
- map-size.ts: rewrite pruneMapToMaxSize to delete in a single pass
  instead of calling map.keys().next() in a while loop

Fixes openclaw#4453
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 23, 2026

Greptile Summary

This PR fixes a real performance regression introduced by #51950: calling prune() on every check() caused a full O(n) linear scan over up to 5,000 cache entries, blocking the Node.js event loop and producing the 33–122 s message delays on active Discord servers.

Changes:

  • src/infra/dedupe.ts: prune() is now called lazily — only when cache.size > maxSize — eliminating the per-message O(n) scan. The newly-touched key is always placed at the tail of the Map, so it is safe from the subsequent pruneMapToMaxSize call.
  • src/infra/map-size.ts: The while-loop is replaced with a cleaner single forward-pass that pre-computes deleteCount = map.size - limit and breaks after exactly that many deletions. Mutating Map entries for already-yielded keys during for…of iteration is well-defined per the ECMAScript spec.

Minor behavioral trade-off: TTL-expired entries are no longer eagerly removed on every call; they linger until the cache grows beyond maxSize. For the dedupe use-case this is fine — individual lookups still perform per-key TTL checks in hasUnexpired, and a full prune fires whenever the cache tips over the size limit. Maximum retained entry count remains bounded at maxSize + 1.

The fix is well-tested (5 dedupe tests + 2 map-size tests) and logically sound.

Confidence Score: 5/5

  • Safe to merge — targeted, correct fix with no regressions and good test coverage.
  • The root cause is well understood, the fix is minimal and scoped entirely to the hot path, the Map-mutation-during-iteration pattern is spec-compliant, and the intentional TTL trade-off is acceptable for a bounded dedupe cache. All edge cases (maxSize=0, fractional maxSize, duplicate-key promotion, explicit clear) are covered by tests.
  • No files require special attention.

Reviews (1): Last reviewed commit: "fix(infra): lazy prune in DedupeCache to..." | Re-trigger Greptile

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6dd2088bb5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +60 to +61
if (cache.size > maxSize) {
prune(now);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid full-cache prune once the dedupe window saturates

check() still calls the full prune(now) path whenever cache.size > maxSize, so this only helps until the live working set reaches the cap. In the Discord handler the window is ttlMs=5*60_000 with maxSize=5000 (extensions/discord/src/monitor/message-handler.ts:34-35), which means any account receiving more than about 17 distinct inbound messages/sec will keep the cache saturated; after that, every new message still pays the same O(n) scan through ~5,000 entries, so the delay regression this patch is meant to fix will continue on the busiest servers.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed review! The saturated-cache edge case is a valid follow-up optimization. For now this fix addresses the core regression for the majority of servers. Happy to open a separate issue for the high-throughput case if that's useful.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c7de07837c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +10 to +14
const deleteCount = map.size - limit;
let i = 0;
for (const key of map.keys()) {
if (i++ >= deleteCount) break;
map.delete(key);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve NaN/invalid-limit behavior in pruneMapToMaxSize

When maxSize is NaN at runtime, this loop now deletes every entry instead of leaving the map untouched. That matters because several in-repo callers still normalize with Math.max(..., Math.floor(...)), which also yields NaN for malformed numeric input (for example src/plugin-sdk/webhook-request-guards.ts:95-115 and src/plugin-sdk/webhook-memory-guards.ts:60-97). After this change, a limiter created with an invalid numeric config immediately forgets its active keys, so duplicate in-flight requests or rate-limited keys are no longer tracked at all; before this commit, pruneMapToMaxSize(..., NaN) was effectively a no-op and those guards kept working.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Discord MessageListener performance regression (33-122s delays)

1 participant