Skip to content

Mark all Kube feature dependencies, and require dependencies to be declared for all features #133912

Merged
k8s-ci-robot merged 2 commits intokubernetes:masterfrom
tallclair:kube-feature-deps
Oct 16, 2025
Merged

Mark all Kube feature dependencies, and require dependencies to be declared for all features #133912
k8s-ci-robot merged 2 commits intokubernetes:masterfrom
tallclair:kube-feature-deps

Conversation

@tallclair
Copy link
Copy Markdown
Member

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Follow up to #133697:

  1. Record dependencies for all existing kube features
  2. Add a unit test which requires dependencies to be declared (even if empty) for every kube feature (sig-arch discussion)

Which issue(s) this PR is related to:

#133533

Does this PR introduce a user-facing change?

NONE

/sig architecture

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver area/kubelet kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Sep 5, 2025
@tallclair
Copy link
Copy Markdown
Member Author

@pohly @johnbelamaric @LionelJouin - I assume that DRAResourceClaimDeviceStatus depends on DynamicResourceAllocation, but the new dependency checking flagged an issue: DRAResourceClaimDeviceStatus was default-enabled in v1.33, but DynamicResourceAllocation was default-disabled. Is the dependency accurate? If so, what do you think we should do here?

(and while you're here - any other DRA dependencies I should add, other than each DRA* feature depending on DynamicResourceAllocation?)

@pohly
Copy link
Copy Markdown
Contributor

pohly commented Sep 5, 2025

With DRA we had the problem that --features=AllAlpha=true led to a configuration where alpha features like DRAResourceClaimDeviceStatus were enabled while DynamicResourceAllocation was disable (off by default due to the API group). We decided to allow enabling DRAResourceClaimDeviceStatus independently (edit) although there was no way to use the feature without DRA also enabled.

If there's now a better way to support --features=AllAlpha=true such that it only enables alpha features which don't depend on a disabled feature, then adding the dependency would be a bit better. It still won't make much of a difference in practice.

Conceptually, all DRA* features depend on DynamicResourceAllocation. In addition, DRABindingConditions depends on DRADeviceStatus.

@tallclair
Copy link
Copy Markdown
Member Author

Hmm, the AllAlpha issue came up in the sig-arch discussion, and it was suggested that it's a reasonable expectation to enable all betas when enabling all alphas. I wonder whether we should just make setting AllAlpha=true automatically also set AllBeta=true?
/cc @liggitt @BenTheElder

@pohly
Copy link
Copy Markdown
Contributor

pohly commented Sep 5, 2025

Hmm, the AllAlpha issue came up in the sig-arch discussion, and it was suggested that it's a reasonable expectation to enable all betas when enabling all alphas.

We've not required that in the past, did we? I don't mind changing it, but right now we have one job which tests exactly that configuration.

I wonder whether we should just make setting AllAlpha=true automatically also set AllBeta=true?

That breaks if the beta feature depends on a beta API group, unless we make the same change also for --runtime-config such that api/alpha=true implies api/beta=true.

// depend on a less stable feature.
//
// Entries are alphabetized.
var defaultKubernetesFeatureGateDependencies = map[featuregate.Feature][]featuregate.Feature{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be a part of a FeatureSpec? So everything is declared together

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would have prefered that, but that would either mean declaring the deps at each version which we don't want, or changing the package API of the versioned features (type VersionedSpecs []FeatureSpec) to have a wrapping struct, which breaks some external tooling around feature gates. We briefly discussed it in sig-arch, and decided to keep it this way for the smaller delta, and consider combining them later.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for the need to repeat it for each version and keep it in the same collection, keeping the whole FG declaration in one place. But this opinion is not very strong.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think eventually we'll want to get to a single data structure (without per-version repetition), but doing it this way minimizes churn for now, and as long as unit tests verify the two are coherent, there won't be accidents

@BenTheElder
Copy link
Copy Markdown
Member

Hmm, the AllAlpha issue came up in the sig-arch discussion, and it was suggested that it's a reasonable expectation to enable all betas when enabling all alphas.

I missed this part, need to think more about this, buy my first reaction was the runtime-config aspect..

Maybe we should actually move to link featuregates and APIs ...

@aojea
Copy link
Copy Markdown
Member

aojea commented Sep 6, 2025

We should think this more carefully as we can end blocking some case we are missing, we want through this with the tagging and coordination between features flags and runtimes config on the e2e and turned out to be really complex

EDIT

I just want to mean that this is great but it is always more difficult than we think at the beginning

@haircommander haircommander moved this from Triage to Archive-it in SIG Node CI/Test Board Oct 8, 2025
@tallclair
Copy link
Copy Markdown
Member Author

Blocked on #134365 for integration test failures.

Comment thread pkg/features/kube_features.go Outdated
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 15, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 15, 2025
@tallclair
Copy link
Copy Markdown
Member Author

OK, I think this is ready for a another round of review. I broke out a few pieces, which have already merged, so this should be more manageable to review now. Let me know if you'd like me to break out the feature-specific fixes into separate PRs too.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 15, 2025
Comment thread hack/verify-test-featuregates.sh
Comment thread pkg/kubelet/apis/config/validation/validation_test.go
Comment thread pkg/registry/resource/resourceclaim/strategy_test.go
@pohly
Copy link
Copy Markdown
Contributor

pohly commented Oct 15, 2025

/skip

A build change unrelated to DRA broke the DRA jobs. Please ignore for now.

/test pull-kubernetes-unit

Copy link
Copy Markdown
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the full PR. The registry/resource changes look okay to me.

We wanted to expose all feature gates that a test depends in the test name, even if the test is only annotated with a single WithFeature call. I believe the necessary APIs exist in the feature gate package. I can try to follow up with a different PR which adds this.

Comment thread pkg/registry/resource/resourceclaim/strategy_test.go
Comment thread pkg/registry/resource/resourceslice/strategy_test.go
Comment thread pkg/registry/resource/resourceslice/strategy_test.go
@pohly
Copy link
Copy Markdown
Contributor

pohly commented Oct 16, 2025

/retest

#134632 fixed the DRA jobs.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

k8s-ci-robot commented Oct 16, 2025

@tallclair: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-unit-windows-master debe026 link false /test pull-kubernetes-unit-windows-master

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@helayoty helayoty moved this to Needs Review in SIG Scheduling Oct 16, 2025
@liggitt
Copy link
Copy Markdown
Member

liggitt commented Oct 16, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 16, 2025
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: f16e12f29db873d736146c54629c304515c5d55a

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, tallclair

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 16, 2025
@pohly
Copy link
Copy Markdown
Contributor

pohly commented Oct 17, 2025

We wanted to expose all feature gates that a test depends in the test name, even if the test is only annotated with a single WithFeature call. I believe the necessary APIs exist in the feature gate package. I can try to follow up with a different PR which adds this.

#134686

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.

Projects

Archived in project
Archived in project
Archived in project
Archived in project
Archived in project

Development

Successfully merging this pull request may close these issues.