Add SetWorkerDeploymentRampingVersion and versioning_info to DescribeTaskQueue#524
Add SetWorkerDeploymentRampingVersion and versioning_info to DescribeTaskQueue#524Shivs11 merged 2 commits intoversioning-3.1from
versioning_info to DescribeTaskQueue#524Conversation
| // Version if it is the Version being set as Current. | ||
| rpc SetWorkerDeploymentCurrentVersion (SetWorkerDeploymentCurrentVersionRequest) returns (SetWorkerDeploymentCurrentVersionResponse) { | ||
| option (google.api.http) = { | ||
| post: "/namespaces/{namespace}/worker-deployment/{deployment_name}/set-current-version" |
There was a problem hiding this comment.
@cretz - following your advice from one of my previous PR's, I decided to make these into plurals too - let me know if this is agreeable
| // Version if it is the Version being set as Current. | ||
| rpc SetWorkerDeploymentCurrentVersion (SetWorkerDeploymentCurrentVersionRequest) returns (SetWorkerDeploymentCurrentVersionResponse) { | ||
| option (google.api.http) = { | ||
| post: "/namespaces/{namespace}/worker-deployment/{deployment_name}/set-current-version" |
| // Ramping Version is set during gradual rollout of a new Version. It takes a share of tasks | ||
| // according to `ramping_version_percentage`. | ||
| temporal.api.deployment.v1.WorkerDeploymentVersion ramping_version = 2; |
There was a problem hiding this comment.
Can the current version and the ramping version belong to different deployments? If not, you need to be explicit in the comments, and describe what would happen if we try to ramp a task queue within a different deployment...
There was a problem hiding this comment.
They can belong to different deployment if a TQ migration to another deployment name. Adding the following clarifying comment about that in my next PR.
// It is possible that Ramping Version belong to a Deployment other than that of the Current
// Version. This happens when a Task Queue is being moved from one Deployment to another.
| float ramping_version_percentage = 3; | ||
| // Last time versioning information of this Task Queue changed. |
There was a problem hiding this comment.
we are using a float instead of an int, what is the range? if it can be 100.0 what does it mean for "current"? Explain...
There was a problem hiding this comment.
Adding the following explanation in my next PR:
// Valid range: [0, 100]. A 100% value means the Ramping Version is receiving full traffic but
// not yet "promoted" to be the Current Version, likely due to pending validations.
| // This map contains Task Queue information for each Build ID. Empty string as key value means unversioned. | ||
| // Only set in `ENHANCED` mode. | ||
| map<string, temporal.api.taskqueue.v1.TaskQueueVersionInfo> versions_info = 3; |
There was a problem hiding this comment.
Is this deprecated?
There was a problem hiding this comment.
It will be once we add an alternative for task queue stats. Not quite yet though.
| // When not present, it means the tasks are routed to "unversioned" workers. Unversioned workers | ||
| // are those with UNVERSIONED (or unspecified) WorkflowVersioningMode. |
There was a problem hiding this comment.
I thought that unversioned workers can also belong to a deployment, is there a way to know which deployment this TQ belongs to when they have unversioned workers?
There was a problem hiding this comment.
Yes, unversioned workers can belong to a deployment but we are not implementing that part just yet in server.
About TQ belonging to a particular deployment, there is no direct membership, other than through the Current and Ramping versions (which can be different).
There was a problem hiding this comment.
What is your thinking for unversioned workers within a deployment in the future? Are you keeping the data structure? Do you want to update the comment...
If they are different, I'm assuming Current wins, right? Or you are assuming a TQ can be in two deployments (one ramping) and we split traffic accordingly
There was a problem hiding this comment.
TQ can be in two deployments (one ramping) and we split traffic accordingly
This one. it's better in that it allows gradual ramp to the other deployment.
What is your thinking for unversioned workers within a deployment in the future?
There has not been enough discussion about that I believe. Whether we want to restrict unversioned TQs to a single deployment or not I don't know. It seems we need to consider what happens when user wants to move the unversioned TQs between deployments and it seems restricting TQ to a single deployment name can add friction to existing rolling deployments of users, so it needs to have a compelling use case.
But if we decide to add the restriction, then I imagine the deployment name would be added to this struct as a separate field, something like unversioned_deployment_name maybe.
| // Set/unset the Current Version of a Worker Deployment. | ||
| message SetWorkerDeploymentCurrentVersionRequest { |
There was a problem hiding this comment.
We need conflict tokens
There was a problem hiding this comment.
Yes, that's gonna be added before we merge the feature branch.
| // The version that was current before executing this operation. | ||
| string previous_version = 1; |
There was a problem hiding this comment.
We need previous deployment_name and conflict token
There was a problem hiding this comment.
What about GetWorkerDeploymentCurrentVersion?
There was a problem hiding this comment.
GetWorkerDeploymentCurrentVersion
No need for this now we have DescribeWorkerDeployment API which would return all the info including current_version. (some data that'd be returned by DescribeWorkerDeployment is not yet added in this feature branch but current_version et al will be there.)
Conflict token will be added separately, we're adding API PRs as things are implemented, hence the chunking.
Previous deployment_name is irrelevant, this API, as the name now suggest, is updating the current version OF a particular deployment. So the API is called on a given deployment.
There was a problem hiding this comment.
I was thinking about the conflict token. Are you adding the token to DescribeWorkerDeployment? If not, we need a way to start and get a first token, so we will need a Getter...
There was a problem hiding this comment.
Are you adding the token to DescribeWorkerDeployment?
yes.
| // Must be a Versioned Build. Pass empty string to unset. | ||
| string version = 3; | ||
| google.protobuf.StringValue version = 3; |
There was a problem hiding this comment.
The empty string means "unversioned" but there is a different meaning of unset that is "do not route anything" If we need the latter we will need to have an "unset" method...
In general, using an empty string to unset when empty already has a different meaning is confusing
There was a problem hiding this comment.
The "do not route anything" feature is not part of the current scope and this API is not supposed to handle that. For that purpose there'd likely be a PauseWorkerDeployment or some other similar API. SetWorkerDeploymentCurrentVersion is meant to be used only for moving between versions or on/off versioning.
So I don't see two different meaning for empty string, it's always "unversioned". and consistent with the meaning of empty current_version in WorkerDeploymentInfo (which is not added in this PR).
There was a problem hiding this comment.
I was thinking that for autoUpgrade workflows there could be a different meaning between "move them to unversioned" for next task, and do not change anything, leave them were they are...
There was a problem hiding this comment.
No, there we won't have that.
| message SetWorkerDeploymentRampingVersionRequest { | ||
| string namespace = 1; |
There was a problem hiding this comment.
We need conflict tokens, In particular, to control interleavings between setCurrent and setRamping.
It should be an error to make inconsistent the deployment_name of ramping vs current.
It may be easier to make the changes atomic if we have a single call that sets current and ramp at the same time.
There was a problem hiding this comment.
Conflict token is going to be added soon, separately.
It should be an error to make inconsistent the deployment_name of ramping vs current.
Not sure I understand this... both APIs are called against a particular deployment name. If use calls the API on a different deployment name it means they want to change the state of a different deployment. The API would fail if the given version doesn't exists in the given deployment name.
There was a problem hiding this comment.
Yes, you are right, conflict can only happen in the TQ if there is a change of deployment for that TQ
| // percentage without a Ramping Version which means unversioned workers are being ramped. | ||
| // (Useful for migrating to unversioned workers from Deployment's Current Version.) | ||
| google.protobuf.StringValue version = 3; | ||
| // Valid range: [0,100]. Zero value will unset the Ramping Version. |
There was a problem hiding this comment.
Zero value unsets ramping but 100.0 does not unset current. This is inconsistent...
There was a problem hiding this comment.
Hmm... I don't see it as inconsistent. Zero is the empty value. 100% ramping version means the version is taking all the traffic but still not promoted to become Current because user is still doing final validations on the full load. So there is a value for it, IMO. Is there a value for 0 ramp?
There was a problem hiding this comment.
Could be a way of pausing a ramp to set it to zero? If zero means ramp version disappears, people will assume that 100% will change current, and be surprised when it doesn't...
There was a problem hiding this comment.
Fair, I'm making that possible in the follow up PR so user can set ramp to 0 without removing ramping version.
| // Can be an empty string and paired with a non-zero `percentage`, in which case it will set the ramp | ||
| // percentage without a Ramping Version which means unversioned workers are being ramped. | ||
| // (Useful for migrating to unversioned workers from Deployment's Current Version.) | ||
| google.protobuf.StringValue version = 3; |
There was a problem hiding this comment.
StringValues are a bit of a pain in languages like go that give default values to everything. I would prefer if you use a struct with {deployment_name, version} that can be easily passed as null. I think we already have one WorkerDeploymentVersion?
There was a problem hiding this comment.
This is just a wrapper to make receiving "" strings possible (as differentiated from not-given). I want to stay away from WorkerDeploymentVersion in here because:
- The deployment_name should be given as a top-level field because this API is calling on that deployment name (and not the version), i.e. it is manipulating the state (current_version) of the deployment resource.
- WorkerDeploymentVersion is meant to represent a version, it feels awkward to have it also represent unversioned and it having an empty
versionvalue.
There was a problem hiding this comment.
I think an empty string meaning "ramp to unversioned" can be very error prone. We have never used before StringValue in the temporal APIs, preferring to add an extra enum or bool field, or a struct, to indicate a special value in a more explicit way. I see your first point of deployment_name being the primary key. The issue of representing unversioned within a deployment is going to be awkward anyway, it may be easier to have a global "unversioned"...
@drewhoskins-temporal what do you think?
There was a problem hiding this comment.
@antlai-temporal I can change it to a different format if you prefer, something like
oneof ramping_version {
string set_ramping_version
bool unset_ramping_version
}
There was a problem hiding this comment.
that is a bit more confusing, if you set unset_ramping_version=false, what does it should do? Nothing?
I think the root problem is that unversioned should not be just "" in a setter, since it could have all sort of side-effects. Maybe use an Enum for it, with just one entry. Or reserve a string "_temporal_unversioned", or add a struct with a field bool isUnversioned and force the "version" to be "" if isUnversioned=true...
READ BEFORE MERGING: All PRs require approval by both Server AND SDK teams before merging! This is why the number of required approvals is "2" and not "1"--two reviewers from the same team is NOT sufficient. If your PR is not approved by someone in BOTH teams, it may be summarily reverted.
What changed?
Why?
Breaking changes
Server PR
SetWorkerDeploymentCurrentVersionand Ramping functionality :) )