Track ownership for scale subresource of deployments #95921

nodo · 2020-10-27T16:05:53Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

Track ownership for scale subresource of deployments. This is achieved using the main resource field manager.

Which issue(s) this PR fixes:

Part of #82046

Special notes for your reviewer:

Passing the main resource field manager using context is not great, however it could be an acceptable compromise given we might refactor this logic in #84530.

This PR uses some pieces of code and suggestions from #83294 and #83444.

Does this PR introduce a user-facing change?:

Track ownership when changing replicas of a deployment via the scale subresource

k8s-ci-robot · 2020-10-27T16:06:02Z

Hi @nodo. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

nodo · 2020-10-27T16:09:06Z

/cc @apelisse @julianvmodesto

fedebongio · 2020-10-27T20:15:08Z

/assign @jennybuckley
/triage accepted

test/integration/apiserver/apply/apply_test.go

julianvmodesto · 2020-11-03T21:01:06Z

test/integration/apiserver/apply/apply_test.go

Hm I wonder if, instead of the helper, it's better to get the managed fields at an index deployment.GetManagedFields()[i], assert the manager name, and assert that it contains the .spec.replicas field.

But for the helper:

this probably could be named assertReplicasOwnership instead

at the beginning, call t.Helper()

at the end, if !managerSeen { t.Fatalf(...) } if .spec.replicas is not in managed fields at all for some reason

IIRC I think the reason we have this helper is to ensure that a new manager is owning the replicas field, but also no other manager is, right? I am not sure using deployment.GetManagedFields()[i] we could check both conditions.

As for the suggestions, totally! thanks for spotting, I will make the changes

julianvmodesto · 2020-11-03T21:08:33Z

Nice -- could you add/update a unit test for this in addition to the integration test? I think in storage_test.go.

julianvmodesto · 2020-11-03T21:10:23Z

/ok-to-test

julianvmodesto · 2020-11-19T18:04:55Z

/lgtm

apelisse · 2020-11-22T20:17:00Z

staging/src/k8s.io/apiserver/pkg/endpoints/installer.go

I typically try to avoid having variables that only make sense in a specific context and not others. In that case, the ParentFieldManager member is always in scopes but is only used for specific sub-resources. In that case (scale sub-resource), do we actually use both fieldmanagers? Is it possible that the fieldmanager above is misconfigured?

I haven't looked extensively, but it's possible that we actually run the FieldManager on the scale object itself before :-(.

Yes, if I am not mistaken, it's used on the scale object itself too (e.g. https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/patch.go#L553)

Yeah, for nothing, but yes :D

Yeah, for nothing, but yes :D

Does that mean that even if this PR enabled tracking ownership via /scale, server-side-applying to /scale would not notify of field manager conflicts on the replicas field? Shouldn't it?

Rather than trying to run two fieldmanager passes on Scale, could we:

translate the managedFields entry in the Deployment containing spec.replicas to the Scale object in scaleFromDeployment?

translate the managedFields entry in the resulting Scale object for spec.replicas indicating the owner of the scale's spec.replicas field back to the Deployment in scaleUpdatedObjectInfo#UpdatedObject?

something like this:

diff --git a/pkg/registry/apps/deployment/storage/storage.go b/pkg/registry/apps/deployment/storage/storage.go index 8bbe4690cea..ebe2b46e093 100644 --- a/pkg/registry/apps/deployment/storage/storage.go +++ b/pkg/registry/apps/deployment/storage/storage.go @@ -329,6 +329,26 @@ func scaleFromDeployment(deployment *apps.Deployment) (*autoscaling.Scale, error if err != nil { return nil, err } + + var managedFields []metav1.ManagedFieldsEntry + // TODO: handle different paths for different API versions + if managedEntry, exists := findEntry(deployment.ManagedFields, "spec", "replicas"); exists { + // We found an owner for the spec.replicas field. + // Translate that into a single v1.Scale managed fields entry + managedFields = []metav1.ManagedFieldsEntry{{ + // copy fields from relevant owner + Manager: managedEntry, + Operation: managedEntry.Operation, + Time: managedEntry.Time, + // Produce a FieldsV1 managed field entry + FieldsType: "FieldsV1", + // API version of v1.Scale + APIVersion: "v1", + // field path to v1.Scale replicas + FieldsV1: &metav1.FieldsV1{Raw: []byte(`"f:spec:":{"f:replicas":{}}`)}, + }} + } + return &autoscaling.Scale{ // TODO: Create a variant of ObjectMeta type that only contains the fields below. ObjectMeta: metav1.ObjectMeta{ @@ -337,6 +357,7 @@ func scaleFromDeployment(deployment *apps.Deployment) (*autoscaling.Scale, error UID: deployment.UID, ResourceVersion: deployment.ResourceVersion, CreationTimestamp: deployment.CreationTimestamp, + ManagedFields: managedFields, }, Spec: autoscaling.ScaleSpec{ Replicas: deployment.Spec.Replicas, @@ -404,5 +425,9 @@ func (i *scaleUpdatedObjectInfo) UpdatedObject(ctx context.Context, oldObj runti // move replicas/resourceVersion fields to object and return deployment.Spec.Replicas = scale.Spec.Replicas deployment.ResourceVersion = scale.ResourceVersion + + if managedEntry, exists := findEntry(scale.ManagedFields, "spec", "replicas"); exists { + // TODO: merge the Scale managed entry for spec.replicas back into the deployment + } return deployment, nil }

I'm wondering if that two-way translation of the relevant managed entry would make the existing field manager work correctly as is

Right, I have added the test here: 2c14088
and it works as expected.

If I understand correctly. The field manager of the main resource is invoked also when running "apply on "scale" so it triggers a conflict. Am I missing something though?

Ah! There's something tricky happening here. To verify that, I think you could write a test where apply will actually ALWAYS conflict on scale. When you apply an object that doesn't have a managedFields set (and that's the case of EVERY scale objects here), we create one that we call "before-first-apply" with existing fields. Here, you're getting a conflict with that manager, not the actual manager that changed the value. Every time you apply, we'll re-create this "before-first-apply" manager (even though you're supposed to own the field), and you'll get a conflict.

I don't know yet how to solve that.

Thanks a lot for spotting, that's quite a tricky edge case :)

I think a possible solution would be to not run Apply on scale objects. E.g. before this line

Add something like this:

if scale, ok := patchObject.(*autoscaling.Scale); ok { return scale, nil }

Applying the field manager here doesn't have any effect anyway.

I tried to implement it but I got stuck because I couldn't find a way to convert an *unstructured.Unstructured to *autoscaling.Scale and I wanted to check with you first.

What do you think about this solution? If it's a good way forward, how to do the conversion?

I have done some more tests and I was trying to convert the Unstructured object using the following code:

scaleGVK := schema.GroupVersionKind{Group: "autoscaling", Version: "v1", Kind: "Scale"} if p.kind == scaleGVK { var scale autoscaling.Scale err := runtime.DefaultUnstructuredConverter.FromUnstructured(patchObj.UnstructuredContent(), &scale) return &scale, err }

This doesn't work for a reason I don't fully understand. The resulting scale object doesn't have any metadata, and I am getting this error:

error the name of the object (deployment based on URL) was undeterminable: name must be provided

Digging a bit deeper, I have noticed that the Scale object does not specify any JSON annotations, do you know the reason for that? Other types such as Deployment have them. Could this be a problem?

Digging a bit deeper, I have noticed that the Scale object does not specify any JSON annotations, do you know the reason for that? Other types such as Deployment have them. Could this be a problem?

You're looking at the "internal type" rather than the "external type". Internal types are not converted to json and hence don't have the tags. You should be looking at the external type (note the difference in the URL, it's probably too subtle ...): https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/autoscaling/v1/types.go#L114-L127

apelisse · 2020-11-22T20:30:18Z

pkg/registry/apps/deployment/storage/storage.go

Are we guaranteed that options is always non-nil here?

It should be quite safe, options is created here, just before the Update call. Do you prefer me to add extra checks?

apelisse · 2020-11-22T20:32:41Z

pkg/registry/apps/deployment/storage/storage.go

I'm assuming that we need to copy in case the Update changes the object in place. I'm pretty sure it doesn't (do you remember @jpbetz?). Either way, there is no reason to copy if we don't call Update, at least move the copy in the if statement. But I don't think the copy is needed.

I might be wrong, have you experimented with it? What made you think you needed to copy?

My assumption was that we need to pass to Update two objects: 1. the object "before" changing it and 2. the object "after" changing it so that the managed fields are updated accordingly. If we don't copy it we effectively pass the same object. I am probably missing something though.

If we need to keep the copy, and we decide we want to do it only when necessary, the resulting code is a bit convoluted. I would propose something like:

var live *apps.Deployment if i.parentFieldManager != nil { live = deployment.DeepCopy() } // move replicas/resourceVersion fields to object deployment.Spec.Replicas = scale.Spec.Replicas deployment.ResourceVersion = scale.ResourceVersion if i.parentFieldManager != nil { // update managed fields using the fieldmanager of the main resource return i.parentFieldManager.Update(live, deployment, i.manager) } return deployment, nil

WDYT?

apelisse · 2020-11-22T20:42:08Z

LGTM once we understand if we need to copy the object.

apelisse · 2020-11-24T04:44:42Z

/lgtm
/approve

apelisse · 2020-11-24T05:04:22Z

test/integration/apiserver/apply/apply_test.go

I'm really not a big fan of this here. Other suggestion:

Check that there is an entry for the manager used in the scale operation,

Check for conflicts on changes to that field

I am not sure I fully understood.

Check that there is an entry for the manager used in the scale operation,

This is done here unless I missed something. fieldManager is the manager used in the scale operation, see for instance here.

Check for conflicts on changes to that field

This was my attempt to check for conflict, since the checks happens right after applying I thought in made sense to check it outside an helper.

I couldn't find an easy we have a way to check the that replicas field is owned by a field manager (and not others) without decoding. I am open to suggestions though.

I see your point of not checking internal structures, but we don't have unit tests for this feature and I would feel more confident having a more strict integration tests. As we discussed, we could definitely refactor this bit into a clearer framework.

WDYT?

apelisse · 2020-11-30T16:41:50Z

/lgtm
/assign @liggitt

liggitt · 2020-12-08T23:06:42Z

pkg/registry/apps/deployment/storage/storage.go

does this do the right thing when the field path of the replicas field is different between two versions of the object?

For example, given:

a CRD with two versions defined:

v1, with the field path for replicas located at spec.replicasV1

v2, with the field path for replicas located at spec.replicasV2

a conversion webhook that converts spec.replicasV1 <-> spec.replicasV2

a scale request to v1 scale and v2 scale

Do the managed fields get set properly for each versioned request?

That would be my assumption, but I am not 100% sure. @apelisse do you know?

This is in apps/deployment, so I'm assuming the question is unrelated to the change?

With regard to the question, yes, we would do the right thing. "managedFields" are tracked per version, and we conflict when the value changes, which we can detect by converting the object to the version of the managedFields.

yeah, sorry... the question was about what would happen if we extrapolated this approach to the scale subresource for a CRD

liggitt · 2020-12-10T06:50:09Z

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/update.go

plumbing via request-scoped context is not what I expected here. is there a reason the storage that needs the field manager cannot get it more directly?

I'm not personally the most knowledgeable about wiring in the apiserver in general, but I had the same reaction when I saw that change. I've tried to find a better way and couldn't find one. That, of course, doesn't mean there isn't one.

liggitt · 2020-12-10T06:52:20Z

staging/src/k8s.io/apiserver/pkg/endpoints/installer.go

Yeah, for nothing, but yes :D

Does that mean that even if this PR enabled tracking ownership via /scale, server-side-applying to /scale would not notify of field manager conflicts on the replicas field? Shouldn't it?

Rather than trying to run two fieldmanager passes on Scale, could we:

translate the managedFields entry in the Deployment containing spec.replicas to the Scale object in scaleFromDeployment?

translate the managedFields entry in the resulting Scale object for spec.replicas indicating the owner of the scale's spec.replicas field back to the Deployment in scaleUpdatedObjectInfo#UpdatedObject?

something like this:

diff --git a/pkg/registry/apps/deployment/storage/storage.go b/pkg/registry/apps/deployment/storage/storage.go index 8bbe4690cea..ebe2b46e093 100644 --- a/pkg/registry/apps/deployment/storage/storage.go +++ b/pkg/registry/apps/deployment/storage/storage.go @@ -329,6 +329,26 @@ func scaleFromDeployment(deployment *apps.Deployment) (*autoscaling.Scale, error if err != nil { return nil, err } + + var managedFields []metav1.ManagedFieldsEntry + // TODO: handle different paths for different API versions + if managedEntry, exists := findEntry(deployment.ManagedFields, "spec", "replicas"); exists { + // We found an owner for the spec.replicas field. + // Translate that into a single v1.Scale managed fields entry + managedFields = []metav1.ManagedFieldsEntry{{ + // copy fields from relevant owner + Manager: managedEntry, + Operation: managedEntry.Operation, + Time: managedEntry.Time, + // Produce a FieldsV1 managed field entry + FieldsType: "FieldsV1", + // API version of v1.Scale + APIVersion: "v1", + // field path to v1.Scale replicas + FieldsV1: &metav1.FieldsV1{Raw: []byte(`"f:spec:":{"f:replicas":{}}`)}, + }} + } + return &autoscaling.Scale{ // TODO: Create a variant of ObjectMeta type that only contains the fields below. ObjectMeta: metav1.ObjectMeta{ @@ -337,6 +357,7 @@ func scaleFromDeployment(deployment *apps.Deployment) (*autoscaling.Scale, error UID: deployment.UID, ResourceVersion: deployment.ResourceVersion, CreationTimestamp: deployment.CreationTimestamp, + ManagedFields: managedFields, }, Spec: autoscaling.ScaleSpec{ Replicas: deployment.Spec.Replicas, @@ -404,5 +425,9 @@ func (i *scaleUpdatedObjectInfo) UpdatedObject(ctx context.Context, oldObj runti // move replicas/resourceVersion fields to object and return deployment.Spec.Replicas = scale.Spec.Replicas deployment.ResourceVersion = scale.ResourceVersion + + if managedEntry, exists := findEntry(scale.ManagedFields, "spec", "replicas"); exists { + // TODO: merge the Scale managed entry for spec.replicas back into the deployment + } return deployment, nil }

I'm wondering if that two-way translation of the relevant managed entry would make the existing field manager work correctly as is

nodo · 2020-12-10T20:28:23Z

Thanks a lot @liggitt for review! 🙏

Does that mean that even if this PR enabled tracking ownership via /scale, server-side-applying to /scale would not notify of field manager conflicts on the replicas field? Shouldn't it?

Field manager will be notified on conflicts (see the integration test here). It just runs the logic of the deployment field manager.

Rather than trying to run two fieldmanager passes on Scale, could we: [...]

The problem with this solution is that we would need to merge back the scale managed field into the deployment one. This is not trivial because we need to update the existing entries that owns replicas and remove it. In other words, if "replicas" is owned by manager A and we run /scale using manager B, we want replicas to be removed from A and assigned to B.

That's why at the end this solution seemed more straightforward.

k8s-ci-robot · 2020-12-17T20:26:37Z

New changes are detected. LGTM label has been removed.

k8s-ci-robot · 2020-12-17T20:27:20Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: apelisse, nodo
To complete the pull request process, please assign liggitt after the PR has been reviewed.
You can assign the PR to them by writing /assign @liggitt in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

pkg/registry/OWNERS
staging/src/k8s.io/apiserver/OWNERS
~~test/integration/apiserver/OWNERS~~ [apelisse]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

liggitt · 2021-01-07T17:48:03Z

Does that mean that even if this PR enabled tracking ownership via /scale, server-side-applying to /scale would not notify of field manager conflicts on the replicas field? Shouldn't it?

Field manager will be notified on conflicts (see the integration test here). It just runs the logic of the deployment field manager.

Does that conflict error come from here:

		// update managed fields using the fieldmanager of the main resource
		return i.parentFieldManager.Update(live, deployment, i.manager)

Do update deployments/scale and apply-patch deployments/scale behave as expected (apply-patch can return field manager conflicts, update always stomps)?

nodo · 2021-01-17T15:33:11Z

@liggitt you raised a very good points in your latest review. The conflict are not raised properly when applying a autoscaling/v1 Scale object as they are raised here. I still haven't found a way forward, but we had a few discussion with @apelisse and @kwiesmueller .

1. It produces a conflict when .spec.replicas is different from the main resource 2. It updates the deployment when forcing it

k8s-ci-robot · 2021-01-23T13:19:11Z

@nodo: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-kubernetes-dependencies	`b7f3ad6`	link	`/test pull-kubernetes-dependencies`
pull-kubernetes-conformance-kind-ga-only-parallel	`b7f3ad6`	link	`/test pull-kubernetes-conformance-kind-ga-only-parallel`
pull-kubernetes-conformance-kind-ipv6-parallel	`b7f3ad6`	link	`/test pull-kubernetes-conformance-kind-ipv6-parallel`
pull-kubernetes-node-e2e	`b7f3ad6`	link	`/test pull-kubernetes-node-e2e`
pull-kubernetes-e2e-kind-ipv6	`b7f3ad6`	link	`/test pull-kubernetes-e2e-kind-ipv6`
pull-kubernetes-e2e-kind	`b7f3ad6`	link	`/test pull-kubernetes-e2e-kind`
pull-kubernetes-bazel-test	`b7f3ad6`	link	`/test pull-kubernetes-bazel-test`
pull-kubernetes-verify	`b7f3ad6`	link	`/test pull-kubernetes-verify`
pull-kubernetes-integration	`b7f3ad6`	link	`/test pull-kubernetes-integration`
pull-kubernetes-e2e-gce-ubuntu-containerd	`b7f3ad6`	link	`/test pull-kubernetes-e2e-gce-ubuntu-containerd`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

nodo · 2021-01-25T18:02:09Z

Closing in favour of #98377

k8s-ci-robot requested review from aojea and deads2k October 27, 2020 16:07

k8s-ci-robot requested review from apelisse and julianvmodesto October 27, 2020 16:09

nodo force-pushed the ssa-scale-ownership-deployment branch from ca5b27c to 8582d50 Compare October 27, 2020 16:21

k8s-ci-robot assigned jennybuckley Oct 27, 2020

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 27, 2020

julianvmodesto reviewed Nov 3, 2020

View reviewed changes

test/integration/apiserver/apply/apply_test.go Outdated Show resolved Hide resolved

julianvmodesto reviewed Nov 3, 2020

View reviewed changes

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 3, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 19, 2020

apelisse reviewed Nov 22, 2020

View reviewed changes

apelisse added this to the v1.21 milestone Nov 24, 2020

apelisse reviewed Nov 24, 2020

View reviewed changes

k8s-ci-robot assigned liggitt Nov 30, 2020

liggitt reviewed Dec 10, 2020

View reviewed changes

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 17, 2020

nodo added 4 commits January 17, 2021 17:08

Track ownership for scale subresource of deployments

1ed49fa

Add test when running "Apply" on "/scale"

1493998

1. It produces a conflict when .spec.replicas is different from the main resource 2. It updates the deployment when forcing it

Add more tests around applying directly scale objects

086f302

WIP

b7f3ad6

nodo force-pushed the ssa-scale-ownership-deployment branch from 216e2db to b7f3ad6 Compare January 23, 2021 12:32

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. area/dependency Issues or PRs related to dependency changes and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 23, 2021

nodo mentioned this pull request Jan 25, 2021

Track ownership of scale subresource #98377

Merged

nodo closed this Jan 25, 2021

Track ownership for scale subresource of deployments #95921

Track ownership for scale subresource of deployments #95921

Uh oh!

Conversation

nodo commented Oct 27, 2020

Uh oh!

k8s-ci-robot commented Oct 27, 2020

Uh oh!

nodo commented Oct 27, 2020

Uh oh!

fedebongio commented Oct 27, 2020

Uh oh!

Uh oh!

julianvmodesto Nov 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

julianvmodesto commented Nov 3, 2020

Uh oh!

julianvmodesto commented Nov 3, 2020

Uh oh!

julianvmodesto commented Nov 19, 2020

Uh oh!

apelisse Nov 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nodo Dec 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

apelisse Nov 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

apelisse Nov 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

apelisse commented Nov 22, 2020

Uh oh!

apelisse commented Nov 24, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

apelisse commented Nov 30, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

julianvmodesto Nov 3, 2020 •

edited

Loading

apelisse Nov 22, 2020 •

edited

Loading

nodo Dec 21, 2020 •

edited

Loading

apelisse Nov 22, 2020 •

edited

Loading

apelisse Nov 22, 2020 •

edited

Loading

apelisse Dec 11, 2020 •

edited

Loading

nodo commented Dec 10, 2020 •

edited

Loading