Skip to content

Trampolining Part 2: Avoid infinite loops for Pinned workflows#9374

Merged
Shivs11 merged 24 commits intomainfrom
trampolining-pt2
Mar 13, 2026
Merged

Trampolining Part 2: Avoid infinite loops for Pinned workflows#9374
Shivs11 merged 24 commits intomainfrom
trampolining-pt2

Conversation

@Shivs11
Copy link
Copy Markdown
Member

@Shivs11 Shivs11 commented Feb 20, 2026

What changed?

  • Consists of the change to prevent any Pinned workflows, that may have forgotten to have the initial CAN Behaviour as AU, from CAN'ing infinitely.
  • Also allows trampolining of a Pinned workflow onto the Unversioned if un-versioned is the current version of the worker deployment at that point in time. Note, the effective behaviour of the workflow would later then be unversioned.

Why?

  • Worker-Versioning correctness.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Potential risks

  • Pre-existing workflows (started before this fix) have nil TargetVersionOnStart. On the first WFT after deployment, "" !=
    "build-v2" → spurious targetDeploymentVersionChanged=true. This is a one-time false-positive regression for those Pinned workflows.

Note

Medium Risk
Touches generated protobuf APIs used across internal services; while changes are mostly additive/regeneration, wire/type mismatches or partial rollouts could cause compatibility issues.

Overview
Adds a new declined_target_version_upgrade field to StartWorkflowExecutionRequest (historyservice API) so continue-as-new/retry runs can persist the prior run’s implicitly-declined target deployment version.

Regenerates protobuf Go bindings, updating import aliases/types across request_response.pb.go, and adds missing Marshal/Unmarshal/Size/Equal helpers for persistence message LastNotifiedTargetVersion.

Written by Cursor Bugbot for commit 7e039eb. This will update automatically on new commits. Configure here.

@Shivs11
Copy link
Copy Markdown
Member Author

Shivs11 commented Feb 20, 2026

In theory, this PR could have been quite small if I had not worried/thought about the replication side of things. I think it's important to note that if we don't persist this information in history (on the event side of things), then it becomes difficult for us to rebuild the mutable state when looking at just the events. This could also cause a workflow, on a MS that has been rebuilt, to undergo an additional CAN.

one could say that this is an overkill for something which is anyways the user's fault, but my knowledge on replication and how consistent things have to be kept is quite minimal, so I decided to take a stab at this.

Shivs11 and others added 3 commits March 9, 2026 22:59
# Conflicts:
#	api/historyservice/v1/request_response.pb.go
Co-Authored-By: Claude Opus 4.6 <[email protected]>
@Shivs11 Shivs11 marked this pull request as ready for review March 10, 2026 03:03
@Shivs11 Shivs11 requested review from a team as code owners March 10, 2026 03:03
// subsequent workflow tasks.
temporal.api.deployment.v1.InheritedAutoUpgradeInfo inherited_auto_upgrade_info = 16;
// The previous run's latest target Worker Deployment Version. Passed through
// during continue-as-new to be written on the WorkflowExecutionStartedEvent.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe also say "nil if this workflow is not initiated by continue-as-new"

Comment on lines +304 to +308
// True when this run was initiated by continue-as-new or retry and inherited
// a pinned version from the previous run. Used in the
// targetDeploymentVersionChanged condition to detect continuation runs that
// declined upgrade.
bool has_inherited_pinned_version_continue_as_new_or_retry = 114;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you including retry?

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Shivs11 and others added 4 commits March 12, 2026 09:11
distinguish "not set" from "set to unversioned (nil)". This fixes
the case where a pinned workflow transitioning to an unversioned
target was not receiving targetDeploymentVersionChanged=true.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
// After the first workflow task, the effective behavior of the workflow is determined by worker-sent values in
// subsequent workflow tasks.
temporal.api.deployment.v1.InheritedAutoUpgradeInfo inherited_auto_upgrade_info = 16;
// During the previous run of this workflow, if it exists, the server may have
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// During the previous run of this workflow, if it exists, the server may have
// During the previous run of this workflow, the server may have

Comment on lines +299 to +302
// The last target version for which the server set targetDeploymentVersionChanged
// to true on a workflow task started event. Updated on each workflow task start,
// set only when the server decides to set the targetDeploymentVersionChanged flag
// to true. Also read during continue-as-new/retry to pass to the new run.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also want to be explicit about that if the last wt started event had targetDeploymentVersionChanged=false then this value would be set to nil.

default:
// Otherwise — target changed + did not decline to upgrade on CaN/retry. Signal the SDK.
targetDeploymentVersionChanged = true
m.ms.executionInfo.LastNotifiedTargetVersion = &historypb.LastNotifiedTargetVersion{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should always unset this if targetDeploymentVersionChanged becomes false.

m.ms.executionInfo.LastNotifiedTargetVersion = &historypb.LastNotifiedTargetVersion{
DeploymentVersion: targetDeploymentVersion,
}
m.ms.executionInfo.NotificationSuppressedTargetVersion = nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run 1:

  • pinned to v1
  • then target version changes to v2
  • run 1 CaNs without upgrade -> declined value would be set to v2 for run 2
    run 2:
  • pinned to v1
  • declined value is v2
  • target version changes to v1 -> no notif -> last notif value is nil
    • here, because target version is now different from the declined value we can reset it. we should also check the revision number.
  • target version changes back to v2
  • Q1: should we notif run 2 now about v2?
    • yes, to give the wf another chance to go to v2
  • say at this point, wf CaNs but again without upgrade
  • Q2: should it pass v2 as declined to run 3?
    • yes, because it declined upgrade duh!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty for this; have also added the rollback test.

// After the first workflow task, the effective behavior of the workflow is determined by worker-sent values in
// subsequent workflow tasks.
temporal.api.deployment.v1.InheritedAutoUpgradeInfo inherited_auto_upgrade_info = 16;
// The target version that the previous run's SDK declined to upgrade to.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that the previous run implicitly declined to upgrade to.

Shivs11 added a commit to temporalio/api that referenced this pull request Mar 12, 2026
…ned wfs (#721)

_**READ BEFORE MERGING:** All PRs require approval by both Server AND
SDK teams before merging! This is why the number of required approvals
is "2" and not "1"--two reviewers from the same team is NOT sufficient.
If your PR is not approved by someone in BOTH teams, it may be summarily
reverted._

<!-- Describe what has changed in this PR -->
 -WISOTT


<!-- Tell your future self why have you made these changes -->
- Users that could using trampolining in the near future could run into
a problem, which shall only surface if the workflow author does not have
AutoUpgrade as the initial CAN behaviour for their Pinned workflows. As
of today, if a user does hit this case, they would have the pinned
workflow CAN'ing infinite number of times. This is because a pinned
workflow would commence on the pinned version (since it would have
inherited pinned version set to true) and if the pinned version is
different than a worker-deployment's current version, the
targetDeploymentVersionChanged would be set to true in the server. This
would then result in the workflow just CAN'ing all the time!
- This change aims to change that by only allowing at-most one CAN for
such pinned workflow executions, and it does so by remembering what the
initial version was at the start of the continue-as-new.


<!-- Are there any breaking changes on binary or code level? -->
**Breaking changes**


<!-- If this breaks the Server, please provide the Server PR to merge
right after this PR was merged. -->
- [PR!](temporalio/temporal#9374)

---------

Co-authored-by: Claude Opus 4.6 <[email protected]>
temporal-cicd bot pushed a commit to temporalio/api-go that referenced this pull request Mar 12, 2026
…ned wfs (#721)

_**READ BEFORE MERGING:** All PRs require approval by both Server AND
SDK teams before merging! This is why the number of required approvals
is "2" and not "1"--two reviewers from the same team is NOT sufficient.
If your PR is not approved by someone in BOTH teams, it may be summarily
reverted._

<!-- Describe what has changed in this PR -->
 -WISOTT

<!-- Tell your future self why have you made these changes -->
- Users that could using trampolining in the near future could run into
a problem, which shall only surface if the workflow author does not have
AutoUpgrade as the initial CAN behaviour for their Pinned workflows. As
of today, if a user does hit this case, they would have the pinned
workflow CAN'ing infinite number of times. This is because a pinned
workflow would commence on the pinned version (since it would have
inherited pinned version set to true) and if the pinned version is
different than a worker-deployment's current version, the
targetDeploymentVersionChanged would be set to true in the server. This
would then result in the workflow just CAN'ing all the time!
- This change aims to change that by only allowing at-most one CAN for
such pinned workflow executions, and it does so by remembering what the
initial version was at the start of the continue-as-new.

<!-- Are there any breaking changes on binary or code level? -->
**Breaking changes**

<!-- If this breaks the Server, please provide the Server PR to merge
right after this PR was merged. -->
- [PR!](temporalio/temporal#9374)

---------

Co-authored-by: Claude Opus 4.6 <[email protected]>
@Shivs11 Shivs11 enabled auto-merge (squash) March 12, 2026 23:41
@semgrep-managed-scans
Copy link
Copy Markdown

Semgrep found 2 missing-explicit-permissions findings:

No explicit GITHUB_TOKEN permissions found at the workflow or job level. Add a permissions: block at the workflow root (applies to all jobs) or per job with least privilege (e.g., contents: read and only specific writes like pull-requests: write if needed).

@Shivs11 Shivs11 merged commit b6ae24e into main Mar 13, 2026
88 of 96 checks passed
@Shivs11 Shivs11 deleted the trampolining-pt2 branch March 13, 2026 01:20
birme pushed a commit to eyevinn-osaas/temporal that referenced this pull request Mar 23, 2026
…ralio#9374)

## What changed?
- Consists of the change to prevent any Pinned workflows, that may have
forgotten to have the initial CAN Behaviour as AU, from CAN'ing
infinitely.
- Also allows trampolining of a Pinned workflow onto the Unversioned if
un-versioned is the current version of the worker deployment at that
point in time. Note, the effective behaviour of the workflow would later
then be unversioned.

## Why?
- Worker-Versioning correctness.

## How did you test it?
- [x] built
- [x] run locally and tested manually
- [x] covered by existing tests
- [ ] added new unit test(s)
- [x] added new functional test(s)

## Potential risks
- Pre-existing workflows (started before this fix) have nil
TargetVersionOnStart. On the first WFT after deployment, "" !=
"build-v2" → spurious targetDeploymentVersionChanged=true. This is a
one-time false-positive regression for those Pinned workflows.


<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Touches generated protobuf API surface used by internal services;
while the change is additive, mismatched proto versions across services
could cause integration/compatibility issues.
> 
> **Overview**
> Adds `declined_target_version_upgrade` to
`StartWorkflowExecutionRequest` (HistoryService API) so
continue-as-new/retry chains can carry forward the SDK-declined target
deployment version and avoid pinned-workflow trampolining loops.
> 
> Regenerates protobuf Go bindings, updating import/type references
across `request_response.pb.go`, and adds missing
`Marshal`/`Unmarshal`/`Size`/`Equal` helpers for the persistence
`LastNotifiedTargetVersion` message.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
0af06d0. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants