Skip to content

CAP_NET_RAW should be NET_RAW#95613

Merged
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
MHBauer:CAP_NET_RAW
Dec 9, 2020
Merged

CAP_NET_RAW should be NET_RAW#95613
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
MHBauer:CAP_NET_RAW

Conversation

@MHBauer
Copy link
Copy Markdown
Contributor

@MHBauer MHBauer commented Oct 15, 2020

Format of Capability adds CAP_ prefix automatically.
Add commentary as test progress output.
Clear up some existing comments.

What type of PR is this?
/kind bug
/kind failing-test

What this PR does / why we need it:
Test currently fails 100% of the time with Error: failed to create containerd task: OCI runtime create failed: container_linux.go:370: starting container process caused: unknown capability "CAP_CAP_NET_RAW": unknown

Which issue(s) this PR fixes:
Fixes #95612

Special notes for your reviewer:
I have no idea where this behavior is documented, I'm reversing my understanding of the test output.

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


Format of Capability adds CAP_ prefix automatically.
Add commentary as test progress output.
Clear up some existing comments.
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 15, 2020
@k8s-ci-robot k8s-ci-robot added area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 15, 2020
@MHBauer
Copy link
Copy Markdown
Contributor Author

MHBauer commented Oct 15, 2020

/sig node

@MHBauer
Copy link
Copy Markdown
Contributor Author

MHBauer commented Oct 15, 2020

Creator of code segment
/cc @harche

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@MHBauer: GitHub didn't allow me to request PR reviews from the following users: harche.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

Creator of code segment
/cc @harche

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@harche
Copy link
Copy Markdown
Contributor

harche commented Oct 15, 2020

It had worked fine on CRIO, but I can test it again on Monday.

@MHBauer
Copy link
Copy Markdown
Contributor Author

MHBauer commented Oct 15, 2020

Alright, so, the documentation for kube seems to inherit the behavior of docker.

https://kubernetes.io/docs/concepts/policy/pod-security-policy/#capabilities
The following fields take a list of capabilities, specified as the capability name in ALL_CAPS without the CAP_ prefix.

Right above this link
https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-seccomp-profile-for-a-container
Note: Linux capability constants have the form CAP_XXX. But when you list capabilities in your Container manifest, you must omit the CAP_ portion of the constant. For example, to add CAP_SYS_TIME, include SYS_TIME in your list of capabilities.

Docker, where the first support comes from
https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities

SecurityContext: &v1.SecurityContext{
Capabilities: &v1.Capabilities{
Add: []v1.Capability{"CAP_NET_RAW"},
Add: []v1.Capability{"NET_RAW"},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main change.

@MHBauer
Copy link
Copy Markdown
Contributor Author

MHBauer commented Oct 15, 2020

/cc @dims
maybe interested?

@k8s-ci-robot k8s-ci-robot requested a review from dims October 15, 2020 21:58
@mrunalp
Copy link
Copy Markdown
Contributor

mrunalp commented Oct 15, 2020

@MHBauer This looks good to me (CRI-O maintainer).

@mrunalp
Copy link
Copy Markdown
Contributor

mrunalp commented Oct 15, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 15, 2020
@MHBauer
Copy link
Copy Markdown
Contributor Author

MHBauer commented Oct 15, 2020

As part of a NodeConformance test, I'm concerned about the original change assuming our defaults are lacking CAP_NET_DEFAULT.

Reading deeper into the link:
DefaultAddCapabilities - The capabilities which are added to containers by default, in addition to the runtime defaults. See the Docker documentation for the default list of capabilities when using the Docker runtime.
https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities
By default, Docker has a default list of capabilities that are kept. The following table lists the Linux capability options which are allowed by default and can be dropped.
In that table is : NET_RAW | Use RAW and PACKET sockets.

I think we need to roll back #95321

@MHBauer
Copy link
Copy Markdown
Contributor Author

MHBauer commented Oct 20, 2020

Reversing my feelings on this, there may be a bunch of tests where we're assuming docker permissions, and it would be good to have them all explicitly set as requirements for the test to run.

@mikebrow
Copy link
Copy Markdown
Member

mikebrow commented Oct 20, 2020

Reversing my feelings on this, there may be a bunch of tests where we're assuming docker permissions, and it would be good to have them all explicitly set as requirements for the test to run.

^ depends what you are testing the defaults, or that you can run tests in spite of default deltas... The problem is if you fix the test cases, customers will break if they don't change their specs like we did for our tests.

Suggestions: 1) test if a container runs with all necessary caps explicitly set 2) test that a container runs as it did in prior releases using defaults (e.g. on docker shim when migrating to CRI runtimes) 3) test that a container fails to run as it failed in prior releases without additional caps set due to docker default 4) and if we are going to change the k8s expected defaults, or accept different defaults for particular runtimes, document and test the differences

@MHBauer
Copy link
Copy Markdown
Contributor Author

MHBauer commented Oct 20, 2020

I'll make some long term issues to follow up on setting Capabilities in tests and updating the documentation beyond "when using docker, we use docker defaults." Seems implied that "when using a specific runtime we use the specific runtime defaults". Other things is obviously that the defaults can be configured for each runtime on each node, so consistency is probably needed on a cluster basis. It makes me wonder if there is already some explicit setting of Capabilities in kubelet and whether that can be configured to ensure that Capabilities are set consistently over a cluster without needing to explicitly set them on each node.

Calling some other containerd people from https://github.com/containerd/project/blob/master/MAINTAINERS
/cc @AkihiroSuda @crosbymichael @estesp @dchen1107

@mikebrow
Copy link
Copy Markdown
Member

@dims still broke here.. should be an easy merge to bring us back on-line

@dims
Copy link
Copy Markdown
Member

dims commented Nov 19, 2020

/assign @sjenning @mrunalp

Seth, Mrunal,
One line change essentially. Can you please approve?

@dims
Copy link
Copy Markdown
Member

dims commented Nov 19, 2020

/triage accepted
/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 19, 2020
@mrunalp
Copy link
Copy Markdown
Contributor

mrunalp commented Nov 19, 2020

/approve

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MHBauer, mrunalp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 19, 2020
@harche
Copy link
Copy Markdown
Contributor

harche commented Nov 19, 2020

/test pull-kubernetes-node-crio-e2e

@harche
Copy link
Copy Markdown
Contributor

harche commented Nov 19, 2020

Please let this pull-kubernetes-node-crio-e2e pass, just to make sure we aren't breaking anything on CRIO.

@harche
Copy link
Copy Markdown
Contributor

harche commented Nov 19, 2020

/test pull-kubernetes-node-crio-e2e

@harche
Copy link
Copy Markdown
Contributor

harche commented Nov 19, 2020

pull-kubernetes-node-crio-e2e recently started passing on PRs, https://prow.k8s.io/job-history/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-node-crio-e2e?buildId=

But here, it's failing even before starting any tests.

@harche
Copy link
Copy Markdown
Contributor

harche commented Nov 19, 2020

I just started that test job on this PR, #96716, let's see how does that go.

@harche
Copy link
Copy Markdown
Contributor

harche commented Nov 19, 2020

@MHBauer after checking out your PR locally and rebasing to master, the job is passing fine.

make test-e2e-node REMOTE=true RUNTIME=remote IMAGE_CONFIG_FILE="/home/harshal/Downloads/image-config-cgrpv1.yaml" INSTANCE_PREFIX="harpatil" CLEANUP=false CONTAINER_RUNTIME_ENDPOINT="unix:///var/run/crio/crio.sock" FOCUS="\[NodeConformance\]|\[NodeFeature:.+\]" SKIP="\[Flaky\]|\[Slow\]|\[Serial\]" TEST_ARGS='--kubelet-flags="--cgroup-driver=systemd --cgroups-per-qos=true --cgroup-root=/ --runtime-cgroups=/system.slice/crio.service --non-masquerade-cidr=0.0.0.0/0" --extra-log="{\"name\": \"crio.log\", \"journalctl\": [\"-u\", \"crio\"]}"'
Ran 196 of 336 Specs in 727.346 seconds
SUCCESS! -- 196 Passed | 0 Failed | 0 Pending | 140 Skipped

I am guessing rebasing will help it pass here too.

@harche
Copy link
Copy Markdown
Contributor

harche commented Nov 19, 2020

I just started that test job on this PR, #96716, let's see how does that go.

It passed there.

@fejta-bot
Copy link
Copy Markdown

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@fejta-bot
Copy link
Copy Markdown

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@MHBauer
Copy link
Copy Markdown
Contributor Author

MHBauer commented Dec 4, 2020

Can I put it into the milestone or do I need a release manager?

/milestone v1.20

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@MHBauer: You must be a member of the kubernetes/milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your and have them propose you as an additional delegate for this responsibility.

Details

In response to this:

Can I put it into the milestone or do I need a release manager?

/milestone v1.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@MHBauer
Copy link
Copy Markdown
Contributor Author

MHBauer commented Dec 4, 2020

Phew, good.

@k8s-ci-robot k8s-ci-robot merged commit 5cdc3e6 into kubernetes:master Dec 9, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Dec 9, 2020
@mikebrow
Copy link
Copy Markdown
Member

mikebrow commented Dec 9, 2020

need this revert pr on 1.20 as well can we get a cherry pick?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Capability set incorrectly. CAP_NET_RAW should be NET_RAW.

9 participants