Add SetWorkerDeploymentRampingVersion and `versioning_info` to DescribeTaskQueue by Shivs11 · Pull Request #524 · temporalio/api

Shivs11 · 2025-01-21T21:11:21Z

READ BEFORE MERGING: All PRs require approval by both Server AND SDK teams before merging! This is why the number of required approvals is "2" and not "1"--two reviewers from the same team is NOT sufficient. If your PR is not approved by someone in BOTH teams, it may be summarily reverted.

What changed?

Title

Why?

Versioning-3.1

Breaking changes

Server PR

soon to come (developing SetWorkerDeploymentCurrentVersion and Ramping functionality :) )

…beTaskQueue

Shivs11 · 2025-01-21T21:24:16Z

temporal/api/workflowservice/v1/service.proto

+    // Version if it is the Version being set as Current.
+    rpc SetWorkerDeploymentCurrentVersion (SetWorkerDeploymentCurrentVersionRequest) returns (SetWorkerDeploymentCurrentVersionResponse) {
        option (google.api.http) = {
-            post: "/namespaces/{namespace}/worker-deployment/{deployment_name}/set-current-version"


@cretz - following your advice from one of my previous PR's, I decided to make these into plurals too - let me know if this is agreeable

Definitely, looks great

cretz · 2025-01-21T22:11:31Z

temporal/api/workflowservice/v1/service.proto

+    // Version if it is the Version being set as Current.
+    rpc SetWorkerDeploymentCurrentVersion (SetWorkerDeploymentCurrentVersionRequest) returns (SetWorkerDeploymentCurrentVersionResponse) {
        option (google.api.http) = {
-            post: "/namespaces/{namespace}/worker-deployment/{deployment_name}/set-current-version"


Definitely, looks great

antlai-temporal · 2025-01-22T21:17:37Z

temporal/api/taskqueue/v1/message.proto

+    // Ramping Version is set during gradual rollout of a new Version. It takes a share of tasks
+    // according to `ramping_version_percentage`.
+    temporal.api.deployment.v1.WorkerDeploymentVersion ramping_version = 2;


Can the current version and the ramping version belong to different deployments? If not, you need to be explicit in the comments, and describe what would happen if we try to ramp a task queue within a different deployment...

They can belong to different deployment if a TQ migration to another deployment name. Adding the following clarifying comment about that in my next PR.

// It is possible that Ramping Version belong to a Deployment other than that of the Current // Version. This happens when a Task Queue is being moved from one Deployment to another.

antlai-temporal · 2025-01-22T21:23:05Z

temporal/api/taskqueue/v1/message.proto

+    float ramping_version_percentage = 3;
+    // Last time versioning information of this Task Queue changed.


we are using a float instead of an int, what is the range? if it can be 100.0 what does it mean for "current"? Explain...

Adding the following explanation in my next PR:

// Valid range: [0, 100]. A 100% value means the Ramping Version is receiving full traffic but // not yet "promoted" to be the Current Version, likely due to pending validations.

antlai-temporal · 2025-01-22T21:28:58Z

temporal/api/workflowservice/v1/request_response.proto

    // This map contains Task Queue information for each Build ID. Empty string as key value means unversioned.
    // Only set in `ENHANCED` mode.
    map<string, temporal.api.taskqueue.v1.TaskQueueVersionInfo> versions_info = 3;


Is this deprecated?

It will be once we add an alternative for task queue stats. Not quite yet though.

antlai-temporal · 2025-01-22T21:33:32Z

temporal/api/workflowservice/v1/request_response.proto

+    // When not present, it means the tasks are routed to "unversioned" workers. Unversioned workers
+    // are those with UNVERSIONED (or unspecified) WorkflowVersioningMode.


I thought that unversioned workers can also belong to a deployment, is there a way to know which deployment this TQ belongs to when they have unversioned workers?

Yes, unversioned workers can belong to a deployment but we are not implementing that part just yet in server.

About TQ belonging to a particular deployment, there is no direct membership, other than through the Current and Ramping versions (which can be different).

What is your thinking for unversioned workers within a deployment in the future? Are you keeping the data structure? Do you want to update the comment...
If they are different, I'm assuming Current wins, right? Or you are assuming a TQ can be in two deployments (one ramping) and we split traffic accordingly

TQ can be in two deployments (one ramping) and we split traffic accordingly

This one. it's better in that it allows gradual ramp to the other deployment.

What is your thinking for unversioned workers within a deployment in the future?

There has not been enough discussion about that I believe. Whether we want to restrict unversioned TQs to a single deployment or not I don't know. It seems we need to consider what happens when user wants to move the unversioned TQs between deployments and it seems restricting TQ to a single deployment name can add friction to existing rolling deployments of users, so it needs to have a compelling use case.

But if we decide to add the restriction, then I imagine the deployment name would be added to this struct as a separate field, something like unversioned_deployment_name maybe.

antlai-temporal · 2025-01-22T21:36:19Z

temporal/api/workflowservice/v1/request_response.proto

+// Set/unset the Current Version of a Worker Deployment.
+message SetWorkerDeploymentCurrentVersionRequest {


We need conflict tokens

Yes, that's gonna be added before we merge the feature branch.

antlai-temporal · 2025-01-22T21:42:47Z

temporal/api/workflowservice/v1/request_response.proto

+    // The version that was current before executing this operation.
+    string previous_version = 1;


We need previous deployment_name and conflict token

What about GetWorkerDeploymentCurrentVersion?

GetWorkerDeploymentCurrentVersion

No need for this now we have DescribeWorkerDeployment API which would return all the info including current_version. (some data that'd be returned by DescribeWorkerDeployment is not yet added in this feature branch but current_version et al will be there.)

Conflict token will be added separately, we're adding API PRs as things are implemented, hence the chunking.

Previous deployment_name is irrelevant, this API, as the name now suggest, is updating the current version OF a particular deployment. So the API is called on a given deployment.

I was thinking about the conflict token. Are you adding the token to DescribeWorkerDeployment? If not, we need a way to start and get a first token, so we will need a Getter...

Are you adding the token to DescribeWorkerDeployment?

yes.

antlai-temporal · 2025-01-22T21:47:27Z

temporal/api/workflowservice/v1/request_response.proto

    // Must be a Versioned Build. Pass empty string to unset.
-    string version = 3;
+    google.protobuf.StringValue version = 3;


The empty string means "unversioned" but there is a different meaning of unset that is "do not route anything" If we need the latter we will need to have an "unset" method...
In general, using an empty string to unset when empty already has a different meaning is confusing

The "do not route anything" feature is not part of the current scope and this API is not supposed to handle that. For that purpose there'd likely be a PauseWorkerDeployment or some other similar API. SetWorkerDeploymentCurrentVersion is meant to be used only for moving between versions or on/off versioning.
So I don't see two different meaning for empty string, it's always "unversioned". and consistent with the meaning of empty current_version in WorkerDeploymentInfo (which is not added in this PR).

I was thinking that for autoUpgrade workflows there could be a different meaning between "move them to unversioned" for next task, and do not change anything, leave them were they are...

No, there we won't have that.

antlai-temporal · 2025-01-22T21:53:07Z

temporal/api/workflowservice/v1/request_response.proto

+message SetWorkerDeploymentRampingVersionRequest {
+    string namespace = 1;


We need conflict tokens, In particular, to control interleavings between setCurrent and setRamping.

It should be an error to make inconsistent the deployment_name of ramping vs current.

It may be easier to make the changes atomic if we have a single call that sets current and ramp at the same time.

Conflict token is going to be added soon, separately.

It should be an error to make inconsistent the deployment_name of ramping vs current.

Not sure I understand this... both APIs are called against a particular deployment name. If use calls the API on a different deployment name it means they want to change the state of a different deployment. The API would fail if the given version doesn't exists in the given deployment name.

Yes, you are right, conflict can only happen in the TQ if there is a change of deployment for that TQ

antlai-temporal · 2025-01-22T21:54:50Z

temporal/api/workflowservice/v1/request_response.proto

+    // percentage without a Ramping Version which means unversioned workers are being ramped.
+    // (Useful for migrating to unversioned workers from Deployment's Current Version.)
+    google.protobuf.StringValue version = 3;
+    // Valid range: [0,100]. Zero value will unset the Ramping Version.


Zero value unsets ramping but 100.0 does not unset current. This is inconsistent...

Hmm... I don't see it as inconsistent. Zero is the empty value. 100% ramping version means the version is taking all the traffic but still not promoted to become Current because user is still doing final validations on the full load. So there is a value for it, IMO. Is there a value for 0 ramp?

Could be a way of pausing a ramp to set it to zero? If zero means ramp version disappears, people will assume that 100% will change current, and be surprised when it doesn't...

Fair, I'm making that possible in the follow up PR so user can set ramp to 0 without removing ramping version.

temporal/api/workflowservice/v1/request_response.proto

antlai-temporal · 2025-01-22T22:09:32Z

temporal/api/workflowservice/v1/request_response.proto

+    // Can be an empty string and paired with a non-zero `percentage`, in which case it will set the ramp
+    // percentage without a Ramping Version which means unversioned workers are being ramped.
+    // (Useful for migrating to unversioned workers from Deployment's Current Version.)
+    google.protobuf.StringValue version = 3;


StringValues are a bit of a pain in languages like go that give default values to everything. I would prefer if you use a struct with {deployment_name, version} that can be easily passed as null. I think we already have one WorkerDeploymentVersion?

This is just a wrapper to make receiving "" strings possible (as differentiated from not-given). I want to stay away from WorkerDeploymentVersion in here because:

The deployment_name should be given as a top-level field because this API is calling on that deployment name (and not the version), i.e. it is manipulating the state (current_version) of the deployment resource.

WorkerDeploymentVersion is meant to represent a version, it feels awkward to have it also represent unversioned and it having an empty version value.

I think an empty string meaning "ramp to unversioned" can be very error prone. We have never used before StringValue in the temporal APIs, preferring to add an extra enum or bool field, or a struct, to indicate a special value in a more explicit way. I see your first point of deployment_name being the primary key. The issue of representing unversioned within a deployment is going to be awkward anyway, it may be easier to have a global "unversioned"...
@drewhoskins-temporal what do you think?

@antlai-temporal I can change it to a different format if you prefer, something like

oneof ramping_version { string set_ramping_version bool unset_ramping_version }

that is a bit more confusing, if you set unset_ramping_version=false, what does it should do? Nothing?
I think the root problem is that unversioned should not be just "" in a setter, since it could have all sort of side-effects. Maybe use an Enum for it, with just one entry. Or reserve a string "_temporal_unversioned", or add a struct with a field bool isUnversioned and force the "version" to be "" if isUnversioned=true...

Add SetWorkerDeploymentRampingVersion and versioning_info to Descri…

dcbf3b0

…beTaskQueue

Shivs11 requested review from a team as code owners January 21, 2025 21:11

ShahabT approved these changes Jan 21, 2025

View reviewed changes

plurals

f7216d3

Shivs11 commented Jan 21, 2025

View reviewed changes

cretz approved these changes Jan 21, 2025

View reviewed changes

Shivs11 merged commit 50441d5 into versioning-3.1 Jan 21, 2025

Shivs11 deleted the ss/ramp+versioningInfo branch January 21, 2025 22:39

antlai-temporal reviewed Jan 22, 2025

View reviewed changes

temporal/api/workflowservice/v1/request_response.proto Show resolved Hide resolved

antlai-temporal reviewed Jan 22, 2025

View reviewed changes

		float ramping_version_percentage = 3;
		// Last time versioning information of this Task Queue changed.

		// When not present, it means the tasks are routed to "unversioned" workers. Unversioned workers
		// are those with UNVERSIONED (or unspecified) WorkflowVersioningMode.

		// Set/unset the Current Version of a Worker Deployment.
		message SetWorkerDeploymentCurrentVersionRequest {

		// The version that was current before executing this operation.
		string previous_version = 1;

		message SetWorkerDeploymentRampingVersionRequest {
		string namespace = 1;

Conversation

Shivs11 commented Jan 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects