Skip to content

Add SetWorkerDeploymentRampingVersion and versioning_info to DescribeTaskQueue#524

Merged
Shivs11 merged 2 commits intoversioning-3.1from
ss/ramp+versioningInfo
Jan 21, 2025
Merged

Add SetWorkerDeploymentRampingVersion and versioning_info to DescribeTaskQueue#524
Shivs11 merged 2 commits intoversioning-3.1from
ss/ramp+versioningInfo

Conversation

@Shivs11
Copy link
Copy Markdown
Member

@Shivs11 Shivs11 commented Jan 21, 2025

READ BEFORE MERGING: All PRs require approval by both Server AND SDK teams before merging! This is why the number of required approvals is "2" and not "1"--two reviewers from the same team is NOT sufficient. If your PR is not approved by someone in BOTH teams, it may be summarily reverted.

What changed?

  • Title

Why?

  • Versioning-3.1

Breaking changes

Server PR

  • soon to come (developing SetWorkerDeploymentCurrentVersion and Ramping functionality :) )

@Shivs11 Shivs11 requested review from a team as code owners January 21, 2025 21:11
// Version if it is the Version being set as Current.
rpc SetWorkerDeploymentCurrentVersion (SetWorkerDeploymentCurrentVersionRequest) returns (SetWorkerDeploymentCurrentVersionResponse) {
option (google.api.http) = {
post: "/namespaces/{namespace}/worker-deployment/{deployment_name}/set-current-version"
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cretz - following your advice from one of my previous PR's, I decided to make these into plurals too - let me know if this is agreeable

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, looks great

// Version if it is the Version being set as Current.
rpc SetWorkerDeploymentCurrentVersion (SetWorkerDeploymentCurrentVersionRequest) returns (SetWorkerDeploymentCurrentVersionResponse) {
option (google.api.http) = {
post: "/namespaces/{namespace}/worker-deployment/{deployment_name}/set-current-version"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, looks great

@Shivs11 Shivs11 merged commit 50441d5 into versioning-3.1 Jan 21, 2025
@Shivs11 Shivs11 deleted the ss/ramp+versioningInfo branch January 21, 2025 22:39
Comment on lines +66 to +68
// Ramping Version is set during gradual rollout of a new Version. It takes a share of tasks
// according to `ramping_version_percentage`.
temporal.api.deployment.v1.WorkerDeploymentVersion ramping_version = 2;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the current version and the ramping version belong to different deployments? If not, you need to be explicit in the comments, and describe what would happen if we try to ramp a task queue within a different deployment...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can belong to different deployment if a TQ migration to another deployment name. Adding the following clarifying comment about that in my next PR.

    // It is possible that Ramping Version belong to a Deployment other than that of the Current
    // Version. This happens when a Task Queue is being moved from one Deployment to another.

Comment on lines +73 to +74
float ramping_version_percentage = 3;
// Last time versioning information of this Task Queue changed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are using a float instead of an int, what is the range? if it can be 100.0 what does it mean for "current"? Explain...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the following explanation in my next PR:

    // Valid range: [0, 100]. A 100% value means the Ramping Version is receiving full traffic but
    // not yet "promoted" to be the Current Version, likely due to pending validations.

Comment on lines 1054 to 1056
// This map contains Task Queue information for each Build ID. Empty string as key value means unversioned.
// Only set in `ENHANCED` mode.
map<string, temporal.api.taskqueue.v1.TaskQueueVersionInfo> versions_info = 3;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this deprecated?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be once we add an alternative for task queue stats. Not quite yet though.

Comment on lines +1059 to +1060
// When not present, it means the tasks are routed to "unversioned" workers. Unversioned workers
// are those with UNVERSIONED (or unspecified) WorkflowVersioningMode.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that unversioned workers can also belong to a deployment, is there a way to know which deployment this TQ belongs to when they have unversioned workers?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unversioned workers can belong to a deployment but we are not implementing that part just yet in server.

About TQ belonging to a particular deployment, there is no direct membership, other than through the Current and Ramping versions (which can be different).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your thinking for unversioned workers within a deployment in the future? Are you keeping the data structure? Do you want to update the comment...
If they are different, I'm assuming Current wins, right? Or you are assuming a TQ can be in two deployments (one ramping) and we split traffic accordingly

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TQ can be in two deployments (one ramping) and we split traffic accordingly

This one. it's better in that it allows gradual ramp to the other deployment.

What is your thinking for unversioned workers within a deployment in the future?

There has not been enough discussion about that I believe. Whether we want to restrict unversioned TQs to a single deployment or not I don't know. It seems we need to consider what happens when user wants to move the unversioned TQs between deployments and it seems restricting TQ to a single deployment name can add friction to existing rolling deployments of users, so it needs to have a compelling use case.

But if we decide to add the restriction, then I imagine the deployment name would be added to this struct as a separate field, something like unversioned_deployment_name maybe.

Comment on lines +1989 to +1990
// Set/unset the Current Version of a Worker Deployment.
message SetWorkerDeploymentCurrentVersionRequest {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need conflict tokens

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's gonna be added before we merge the feature branch.

Comment on lines +2000 to +2001
// The version that was current before executing this operation.
string previous_version = 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need previous deployment_name and conflict token

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about GetWorkerDeploymentCurrentVersion?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetWorkerDeploymentCurrentVersion

No need for this now we have DescribeWorkerDeployment API which would return all the info including current_version. (some data that'd be returned by DescribeWorkerDeployment is not yet added in this feature branch but current_version et al will be there.)

Conflict token will be added separately, we're adding API PRs as things are implemented, hence the chunking.

Previous deployment_name is irrelevant, this API, as the name now suggest, is updating the current version OF a particular deployment. So the API is called on a given deployment.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the conflict token. Are you adding the token to DescribeWorkerDeployment? If not, we need a way to start and get a first token, so we will need a Getter...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you adding the token to DescribeWorkerDeployment?

yes.

Comment on lines 1993 to +1994
// Must be a Versioned Build. Pass empty string to unset.
string version = 3;
google.protobuf.StringValue version = 3;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty string means "unversioned" but there is a different meaning of unset that is "do not route anything" If we need the latter we will need to have an "unset" method...
In general, using an empty string to unset when empty already has a different meaning is confusing

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "do not route anything" feature is not part of the current scope and this API is not supposed to handle that. For that purpose there'd likely be a PauseWorkerDeployment or some other similar API. SetWorkerDeploymentCurrentVersion is meant to be used only for moving between versions or on/off versioning.
So I don't see two different meaning for empty string, it's always "unversioned". and consistent with the meaning of empty current_version in WorkerDeploymentInfo (which is not added in this PR).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that for autoUpgrade workflows there could be a different meaning between "move them to unversioned" for next task, and do not change anything, leave them were they are...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there we won't have that.

Comment on lines +2008 to +2009
message SetWorkerDeploymentRampingVersionRequest {
string namespace = 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need conflict tokens, In particular, to control interleavings between setCurrent and setRamping.

It should be an error to make inconsistent the deployment_name of ramping vs current.

It may be easier to make the changes atomic if we have a single call that sets current and ramp at the same time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflict token is going to be added soon, separately.

It should be an error to make inconsistent the deployment_name of ramping vs current.

Not sure I understand this... both APIs are called against a particular deployment name. If use calls the API on a different deployment name it means they want to change the state of a different deployment. The API would fail if the given version doesn't exists in the given deployment name.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, conflict can only happen in the TQ if there is a change of deployment for that TQ

// percentage without a Ramping Version which means unversioned workers are being ramped.
// (Useful for migrating to unversioned workers from Deployment's Current Version.)
google.protobuf.StringValue version = 3;
// Valid range: [0,100]. Zero value will unset the Ramping Version.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero value unsets ramping but 100.0 does not unset current. This is inconsistent...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I don't see it as inconsistent. Zero is the empty value. 100% ramping version means the version is taking all the traffic but still not promoted to become Current because user is still doing final validations on the full load. So there is a value for it, IMO. Is there a value for 0 ramp?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a way of pausing a ramp to set it to zero? If zero means ramp version disappears, people will assume that 100% will change current, and be surprised when it doesn't...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, I'm making that possible in the follow up PR so user can set ramp to 0 without removing ramping version.

// Can be an empty string and paired with a non-zero `percentage`, in which case it will set the ramp
// percentage without a Ramping Version which means unversioned workers are being ramped.
// (Useful for migrating to unversioned workers from Deployment's Current Version.)
google.protobuf.StringValue version = 3;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringValues are a bit of a pain in languages like go that give default values to everything. I would prefer if you use a struct with {deployment_name, version} that can be easily passed as null. I think we already have one WorkerDeploymentVersion?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a wrapper to make receiving "" strings possible (as differentiated from not-given). I want to stay away from WorkerDeploymentVersion in here because:

  • The deployment_name should be given as a top-level field because this API is calling on that deployment name (and not the version), i.e. it is manipulating the state (current_version) of the deployment resource.
  • WorkerDeploymentVersion is meant to represent a version, it feels awkward to have it also represent unversioned and it having an empty version value.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an empty string meaning "ramp to unversioned" can be very error prone. We have never used before StringValue in the temporal APIs, preferring to add an extra enum or bool field, or a struct, to indicate a special value in a more explicit way. I see your first point of deployment_name being the primary key. The issue of representing unversioned within a deployment is going to be awkward anyway, it may be easier to have a global "unversioned"...
@drewhoskins-temporal what do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antlai-temporal I can change it to a different format if you prefer, something like

oneof ramping_version {
  string set_ramping_version
  bool unset_ramping_version
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a bit more confusing, if you set unset_ramping_version=false, what does it should do? Nothing?
I think the root problem is that unversioned should not be just "" in a setter, since it could have all sort of side-effects. Maybe use an Enum for it, with just one entry. Or reserve a string "_temporal_unversioned", or add a struct with a field bool isUnversioned and force the "version" to be "" if isUnversioned=true...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants