Conversation
In case it takes more than 1 second for systemd to create a unit, startUnit() times out with a warning and then runc proceeds (to create cgroups using fs manager and so on). Now runc and systemd are racing, and multiple scenarios are possible. In one such scenario, by the time runc calls systemd manager's Apply() the unit is not yet created, the dbusConnection.SetUnitProperties() call fails with "unit xxx.scope not found", and the whole container start also fails. To eliminate the race, we need to return an error in case the timeout is hit. To reduce the chance to fail, increase the timeout from 1 to 30 seconds, to not error out too early on a busy/slow system (and times like 3-5 seconds are not unrealistic). While at it, as the timeout is quite long now, make sure to not leave a stray timer. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 3844789) Signed-off-by: Kir Kolyshkin <[email protected]>
It is not a good practice to have the name `err` for the error returned by a function. Switch to `retErr`. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit 8699596) Signed-off-by: Kir Kolyshkin <[email protected]>
In case cgroup configuration is invalid (some parameters can't be set etc.), p.manager.Set fails, the error is returned, and then we try to remove cgroups (by calling p.manager.Destroy) in a defer. The problem is, the container init is not yet killed (as it is killed in the caller, i.e. (*linuxContainer).start), so cgroup removal fails like this: > time="2020-09-26T07:46:25Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/net_cls,net_prio/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" > ... > time="2020-09-26T07:46:27Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/net_cls,net_prio/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" The above is repeated for every controller, and looks quite scary. To fix, move the init termination to the abovementioned defer. Do the same for (*setnsProcess).start() for uniformity. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit dc42459) Signed-off-by: Kir Kolyshkin <[email protected]>
When we call terminate(), we kill the process, and wait returns the error indicating the process was killed. This is exactly what we expect here, so there is no reason to treat it as an error. Before this patch, when a container with invalid cgroup parameters is started: > WARN[0000] unable to terminate initProcess error="signal: killed" > ERRO[0000] container_linux.go:366: starting container process caused: process_linux.go:495: container init caused: process_linux.go:458: setting cgroup config for procHooks process caused: failed to write "555": open /sys/fs/cgroup/blkio/user.slice/xx33/blkio.weight: permission denied After: > ERRO[0000] container_linux.go:366: starting container process caused: process_linux.go:495: container init caused: process_linux.go:458: setting cgroup config for procHooks process caused: failed to write "555": open /sys/fs/cgroup/blkio/user.slice/xx33/blkio.weight: permission denied I.e. the useless warning is gone. NOTE this breaks a couple of integration test cases, since they were expecting a particular message in the second line, and now due to "signal: killed" removed it's in the first line. Fix those, too. Signed-off-by: Kir Kolyshkin <[email protected]> (cherry picked from commit a4d5e8a) Signed-off-by: Kir Kolyshkin <[email protected]>
e78b8fd to
ba0c20e
Compare
kolyshkin
pushed a commit
that referenced
this pull request
Feb 4, 2021
Aleksa Sarai (2): VERSION: back to development VERSION: release 1.0.0~rc93 Vote: +6 -0 #1 Closes opencontainers#2784
kolyshkin
added a commit
that referenced
this pull request
May 15, 2023
When a hook has failed, the error message looks like this: > error running hook: error running hook #1: exit status 1, stdout: ... The two problems here are: 1. it is impossible to know what kind of hook it was; 2. "error running hook" stuttering; Change that to > error running createContainer hook #1: exit status 1, stdout: ... Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin
added a commit
that referenced
this pull request
May 16, 2023
When a hook has failed, the error message looks like this: > error running hook: error running hook #1: exit status 1, stdout: ... The two problems here are: 1. it is impossible to know what kind of hook it was; 2. "error running hook" stuttering; Change that to > error running createContainer hook #1: exit status 1, stdout: ... Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin
added a commit
that referenced
this pull request
May 18, 2023
When a hook has failed, the error message looks like this: > error running hook: error running hook #1: exit status 1, stdout: ... The two problems here are: 1. it is impossible to know what kind of hook it was; 2. "error running hook" stuttering; Change that to > error running createContainer hook #1: exit status 1, stdout: ... Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin
added a commit
that referenced
this pull request
Jun 13, 2023
When a hook has failed, the error message looks like this: > error running hook: error running hook #1: exit status 1, stdout: ... The two problems here are: 1. it is impossible to know what kind of hook it was; 2. "error running hook" stuttering; Change that to > error running createContainer hook #1: exit status 1, stdout: ... Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin
added a commit
that referenced
this pull request
Aug 11, 2023
When a hook has failed, the error message looks like this: > error running hook: error running hook #1: exit status 1, stdout: ... The two problems here are: 1. it is impossible to know what kind of hook it was; 2. "error running hook" stuttering; Change that to > error running createContainer hook #1: exit status 1, stdout: ... Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin
added a commit
that referenced
this pull request
Aug 25, 2023
When a hook has failed, the error message looks like this: > error running hook: error running hook #1: exit status 1, stdout: ... The two problems here are: 1. it is impossible to know what kind of hook it was; 2. "error running hook" stuttering; Change that to > error running createContainer hook #1: exit status 1, stdout: ... Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin
added a commit
that referenced
this pull request
Jun 23, 2024
MAINTAINERS: add Rodrigo Campos LGTMs: crosbymichael mrunalp dqminh hqhq cyphar AkihiroSuda kolyshkin thaJeztah lifubang Vote: +8 -0 #1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.