fix(infra): lazy prune in DedupeCache to resolve Discord 33-122s message delays#52762
fix(infra): lazy prune in DedupeCache to resolve Discord 33-122s message delays#52762Erprabhat8423 wants to merge 2 commits intoopenclaw:mainfrom
Conversation
The dedupe fix in openclaw#51950 introduced recentInboundMessages.check() in the Discord message handler. This called prune() on every single message, which did a full linear scan of up to 5000 cache entries on each invocation. On active Discord servers, this blocked the event loop for tens of milliseconds per message, stacking up to the 33-122s delays reported in openclaw#4453. Also fixed map-size.ts which had corrupted duplicate code. Changes: - dedupe.ts: only call prune() when cache.size exceeds maxSize - map-size.ts: rewrite pruneMapToMaxSize to delete in a single pass instead of calling map.keys().next() in a while loop Fixes openclaw#4453
Greptile SummaryThis PR fixes a real performance regression introduced by #51950: calling Changes:
Minor behavioral trade-off: TTL-expired entries are no longer eagerly removed on every call; they linger until the cache grows beyond The fix is well-tested (5 dedupe tests + 2 map-size tests) and logically sound. Confidence Score: 5/5
Reviews (1): Last reviewed commit: "fix(infra): lazy prune in DedupeCache to..." | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6dd2088bb5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (cache.size > maxSize) { | ||
| prune(now); |
There was a problem hiding this comment.
Avoid full-cache prune once the dedupe window saturates
check() still calls the full prune(now) path whenever cache.size > maxSize, so this only helps until the live working set reaches the cap. In the Discord handler the window is ttlMs=5*60_000 with maxSize=5000 (extensions/discord/src/monitor/message-handler.ts:34-35), which means any account receiving more than about 17 distinct inbound messages/sec will keep the cache saturated; after that, every new message still pays the same O(n) scan through ~5,000 entries, so the delay regression this patch is meant to fix will continue on the busiest servers.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Thanks for the detailed review! The saturated-cache edge case is a valid follow-up optimization. For now this fix addresses the core regression for the majority of servers. Happy to open a separate issue for the high-throughput case if that's useful.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c7de07837c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const deleteCount = map.size - limit; | ||
| let i = 0; | ||
| for (const key of map.keys()) { | ||
| if (i++ >= deleteCount) break; | ||
| map.delete(key); |
There was a problem hiding this comment.
Preserve NaN/invalid-limit behavior in pruneMapToMaxSize
When maxSize is NaN at runtime, this loop now deletes every entry instead of leaving the map untouched. That matters because several in-repo callers still normalize with Math.max(..., Math.floor(...)), which also yields NaN for malformed numeric input (for example src/plugin-sdk/webhook-request-guards.ts:95-115 and src/plugin-sdk/webhook-memory-guards.ts:60-97). After this change, a limiter created with an invalid numeric config immediately forgets its active keys, so duplicate in-flight requests or rate-limited keys are no longer tracked at all; before this commit, pruneMapToMaxSize(..., NaN) was effectively a no-op and those guards kept working.
Useful? React with 👍 / 👎.
Fixes #4453
Root Cause
The dedupe fix (#51950) added
recentInboundMessages.check()to theDiscord message handler.
check()calledprune()on every message,which did a full linear scan of up to 5000 cache entries each time.
On active servers this blocked the event loop cumulatively, producing
the exact 33-122s delays reported in the issue.
Changes
src/infra/dedupe.ts— only prune whencache.size > maxSizesrc/infra/map-size.ts— fixed corrupted duplicate code + rewrotepruneMapToMaxSize to delete in a single forward pass
Testing