Skip to content

feat(queue): add WorkerRemovedResolver to resolve removed worker's tasks as exception/worker-shutdown#8304

Merged
matt-boris merged 7 commits intomainfrom
matt-boris/queueListenToWorkerRemovals
Feb 23, 2026
Merged

feat(queue): add WorkerRemovedResolver to resolve removed worker's tasks as exception/worker-shutdown#8304
matt-boris merged 7 commits intomainfrom
matt-boris/queueListenToWorkerRemovals

Conversation

@matt-boris
Copy link
Contributor

Fixes #7477. Fixes #7472.

The queue service now listens for workerRemoved events from worker-manager and immediately resolves any tasks claimed by that worker as exception/worker-shutdown, triggering an automatic retry.
Previously, when a worker disappeared (due to VM preemption, crash, or manual termination), its claimed tasks would wait up to 20 minutes for the claim to expire before being retried.
This new workerRemovedResolver background process runs alongside the existing claim-resolver and requires no configuration changes.

Adds a new DB function to look up claimed tasks by worker identity
(task_queue_id, worker_group, worker_id). This will be used by the
queue's worker-removed-resolver to resolve tasks when a worker is removed.
Resolves claimed tasks as exception/worker-shutdown when a workerRemoved
Pulse event is received from worker-manager. Follows the same
post-resolution logic as ClaimResolver (publishing taskException,
scheduling retries, updating dependency tracker).
Adds the worker-removed-resolver as a new background process in the
queue service. Subscribes to worker-manager's workerRemoved Pulse
events to resolve orphaned tasks immediately. Includes tests for
retry, no-retry, already-resolved, and no-tasks scenarios.
@matt-boris matt-boris force-pushed the matt-boris/queueListenToWorkerRemovals branch from 24e5ed5 to 74c162c Compare February 20, 2026 20:34
@matt-boris matt-boris marked this pull request as ready for review February 23, 2026 13:57
@matt-boris matt-boris requested a review from a team as a code owner February 23, 2026 13:57
@matt-boris matt-boris requested review from lotas and petemoore and removed request for a team February 23, 2026 13:57
@matt-boris matt-boris force-pushed the matt-boris/queueListenToWorkerRemovals branch from 7956654 to 7af4657 Compare February 23, 2026 14:56
Copy link
Contributor

@lotas lotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks 🙏

@matt-boris matt-boris merged commit 667ca2a into main Feb 23, 2026
73 checks passed
@matt-boris matt-boris deleted the matt-boris/queueListenToWorkerRemovals branch February 23, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants