Improve error when version override does not have the version present in the task queue#9020
Improve error when version override does not have the version present in the task queue#9020
Conversation
service/worker/batcher/activities.go
Outdated
| // Check if error is non-retryable by checking if the error message contains | ||
| // any of the non-retryable error strings | ||
| nonRetryable := false | ||
| for _, nonRetryableErr := range batchOperation.NonRetryableErrors { | ||
| if strings.Contains(err.Error(), nonRetryableErr) { | ||
| nonRetryable = true | ||
| break | ||
| } | ||
| } |
There was a problem hiding this comment.
may need a review from your side @awln-temporal (since you last touched this bit): TLDR is that I don't want the error message that I am passing from the frontend to be retried over here. However, the error message (with the latest changes) is generated dynamically and needs a task queue name when it is being built. The frontend code, specifically in the batcher, does not have this information. Thus, I am required to pass in only a subset of the string which feels quite...ugly to me.
Having dug through this a bit more, I see that we use NonRetryableErrors, which is a slice of strings. I wonder if we can make this better by changing it to be for specific error types?
Moreover, if making the change that I described is kinda risky right now, well, is the change that I have made in activities.go safe?
There was a problem hiding this comment.
Yeah, checking for error types seems like a better way to do this. It also seems that the only place that appends to this non-retryable errors slice is for this pinned worker version check. Fmi, why do we need to configure the list of retriable errors at runtime based off the batch operation type?
afe071b to
21bc7b4
Compare
awln-temporal
left a comment
There was a problem hiding this comment.
Fine with current changes, but does it make sense to create separate error type for non-retryable, retryable error, create an error subtype for pinned version not in task queue that implements non-retryable error, instead of checking for substrings?
… in the task queue (#9020) ## What changed? - WISOTT ## Why? - User experience. ## How did you test it? - [x] built - [ ] run locally and tested manually - [x] covered by existing tests - [ ] added new unit test(s) - [ ] added new functional test(s) ## Potential risks - None <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Enhances error clarity and batch behavior around version overrides. > > - Introduces `ErrPinnedVersionNotInTaskQueueSubstring` and `FormatPinnedVersionNotInTaskQueueError` to produce detailed messages (deployment, build ID, task queue, type) > - Replaces generic errors in `validatePinnedVersionInTaskQueue` with formatted message; updates related tests to expect new text > - Adds operation-aware non-retryable detection in `batcher` (`isNonRetryableError`) for `BATCH_OPERATION_TYPE_UPDATE_EXECUTION_OPTIONS` using the new substring; integrates into `handleTaskResult` > - Removes frontend-added non-retryable string for update-execution-options; responsibility moved to worker > - Adds unit tests for new formatter and non-retryable detection > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit eb89423. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
Backport changes from #905 to work with server v1.30.0-149.0. Server PR temporalio/temporal#9020 added validation that pinned version overrides must reference a version that exists in the task queue. This broke tests that were setting pinned overrides without first registering workers for those versions. Changes: 1. commands.workflow.go: Fix validation to require BOTH deployment name AND build ID for pinned behavior (change && to ||) 2. commands.workflow_test.go: Update versioning override tests to start workers for both version1 and version2 before setting overrides, ensuring versions exist in the task queue 3. commands.workflow_reset_update_options_test.go: Update pinned behavior reset tests to start a versioned worker before attempting reset with pinned override
What changed?
Why?
How did you test it?
Potential risks
Note
Enhances error clarity and batch behavior around version overrides.
ErrPinnedVersionNotInTaskQueueSubstringandFormatPinnedVersionNotInTaskQueueErrorto produce detailed messages (deployment, build ID, task queue, type)validatePinnedVersionInTaskQueuewith formatted message; updates related tests to expect new textbatcher(isNonRetryableError) forBATCH_OPERATION_TYPE_UPDATE_EXECUTION_OPTIONSusing the new substring; integrates intohandleTaskResultWritten by Cursor Bugbot for commit eb89423. This will update automatically on new commits. Configure here.