Skip to content

improvement: platfrm-184 update bullmq version for jobScheduler#5723

Merged
PrestigePvP merged 7 commits intomainfrom
tre/platfrm-184-update-bullmq-scheduler
Mar 19, 2026
Merged

improvement: platfrm-184 update bullmq version for jobScheduler#5723
PrestigePvP merged 7 commits intomainfrom
tre/platfrm-184-update-bullmq-scheduler

Conversation

@PrestigePvP
Copy link
Copy Markdown
Contributor

No description provided.

@linear
Copy link
Copy Markdown

linear bot commented Mar 16, 2026

@maidul98
Copy link
Copy Markdown
Collaborator

maidul98 commented Mar 16, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 16, 2026

Greptile Summary

This PR upgrades BullMQ from ^5.4.2 to ^5.67.3 and migrates all recurring queue jobs from the legacy repeatable jobs API (queue.add with repeat option) to the newer upsertJobScheduler API. This addresses uncapped Redis memory usage caused by the old repeatable jobs implementation, which accumulated Redis keys over time. The PR also adds three new methods to the queue service (upsertJobScheduler, removeJobScheduler, getJobSchedulers), marks the old APIs as @deprecated, and improves type safety by using Partial<Record> for queue/worker containers with optional chaining.

  • Migration gap — legacy repeatable jobs not cleaned up: The old code explicitly called stopRepeatableJob before setting up schedules to remove stale repeatable job configs from Redis. These cleanup calls are removed without replacement. Since BullMQ's job schedulers and legacy repeatable jobs are independent data structures in Redis, any existing legacy repeatable jobs will continue to fire alongside the new schedulers after deployment, causing duplicate job executions for every affected queue (~15 queues). A one-time cleanup step should be added.
  • Dynamic secret lease queue: Correctly migrated stopRepeatableJobByJobIdstopJobById for delayed (non-repeatable) jobs, and removed a redundant duplicate call.
  • Telemetry queue: Previously, stopRepeatableJob was called unconditionally (even when postHog was undefined) to ensure old jobs were always cleaned up. The new code only calls upsertJobScheduler inside the if (postHog) block, so legacy jobs on instances without PostHog will never be cleaned.

Confidence Score: 2/5

  • This PR risks duplicate job executions on any deployment upgrading from the previous version with existing Redis state.
  • The migration from legacy repeatable jobs to job schedulers is well-structured, but the removal of all stopRepeatableJob cleanup calls without a replacement migration mechanism means existing deployments will have both legacy repeatable jobs AND new scheduler jobs firing simultaneously. This affects ~15 queues including critical ones like certificate rotation, secret rotation, telemetry, and resource cleanup. The fix is straightforward (add a legacy cleanup step), but without it the PR introduces a high-impact regression on upgrades.
  • Pay close attention to backend/src/queue/queue-service.ts (missing legacy job cleanup in upsertJobScheduler) and backend/src/services/telemetry/telemetry-queue.ts (conditional cleanup path change).

Important Files Changed

Filename Overview
backend/src/queue/queue-service.ts Core queue service: Adds new upsertJobScheduler, removeJobScheduler, getJobSchedulers wrappers and converts internal reconciliation cron to use job schedulers. Changes container types to Partial<Record> with optional chaining. Missing cleanup of legacy repeatable jobs risks duplicate executions on upgrades.
backend/package.json BullMQ version bump from ^5.4.2 to ^5.67.3 to gain job scheduler API support and fix uncapped Redis memory from legacy repeatable jobs.
backend/src/ee/services/dynamic-secret-lease/dynamic-secret-lease-queue.ts Correctly changed stopRepeatableJobByJobId to stopJobById for delayed (non-repeatable) revocation jobs. Removed a redundant duplicate stopRepeatableJobByJobId call.
backend/src/services/certificate-authority/certificate-authority-queue.ts Migrated CA CRL rotation from legacy repeatable (with explicit stopRepeatableJob cleanup) to upsertJobScheduler. Old cleanup removed, no replacement for legacy jobs still in Redis.
backend/src/services/resource-cleanup/resource-cleanup-queue.ts Migrated both daily and frequent resource cleanup from legacy repeatable to upsertJobScheduler. Previous stopRepeatableJob cleanup removed without replacement.
backend/src/services/telemetry/telemetry-queue.ts Migrated telemetry instance stats and aggregated events from legacy repeatable to upsertJobScheduler. Previous stopRepeatableJob calls (which ran even when postHog was disabled) removed without replacement.

Last reviewed commit: 402ece2

@PrestigePvP PrestigePvP force-pushed the tre/platfrm-184-update-bullmq-scheduler branch from c9f14ff to a646e81 Compare March 17, 2026 16:09
Copy link
Copy Markdown
Contributor

@victorvhs017 victorvhs017 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run the app, I get:

Image

You can replicate by creating a secret rotation on main and then checking out this branch.

The issue here is that even though we stop the repeatable job creator, it may have already created the next job and put it in the queue.

And because our job scheduler uses the same id in its jobs, it won't be able to produce the next job while this one is not executed.

I see two options here:

  • Remove the repeatable jobs (what you've done) AND any job created by it from the queue
  • Change the id for the scheduler jobs. This would fix the issue with one drawback: the next job would be executed twice (the remaining job from the repeatable jobs producer + the new job from the new scheduler).

Looking at the queues, it shouldn't be an issue to execute this twice, only once. The biggest impact would be one duplicated email and a couple of duplicated notifications. But it's good to test if the old job id is completely gone from Redis after that.

@PrestigePvP PrestigePvP force-pushed the tre/platfrm-184-update-bullmq-scheduler branch from c378b95 to 45a5fd9 Compare March 18, 2026 22:09
@PrestigePvP PrestigePvP merged commit 3fef3a1 into main Mar 19, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants