Skip to content

testing golangci-lint#1

Closed
kolyshkin wants to merge 4 commits intomasterfrom
bz1883640
Closed

testing golangci-lint#1
kolyshkin wants to merge 4 commits intomasterfrom
bz1883640

Conversation

@kolyshkin
Copy link
Copy Markdown
Owner

No description provided.

In case it takes more than 1 second for systemd to create a unit,
startUnit() times out with a warning and then runc proceeds
(to create cgroups using fs manager and so on).

Now runc and systemd are racing, and multiple scenarios are possible.

In one such scenario, by the time runc calls systemd manager's Apply()
the unit is not yet created, the dbusConnection.SetUnitProperties()
call fails with "unit xxx.scope not found", and the whole container
start also fails.

To eliminate the race, we need to return an error in case the timeout is
hit.

To reduce the chance to fail, increase the timeout from 1 to 30 seconds,
to not error out too early on a busy/slow system (and times like 3-5
seconds are not unrealistic).

While at it, as the timeout is quite long now, make sure to not leave
a stray timer.

Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 3844789)
Signed-off-by: Kir Kolyshkin <[email protected]>
It is not a good practice to have the name `err` for the error returned
by a function.  Switch to `retErr`.

Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit 8699596)
Signed-off-by: Kir Kolyshkin <[email protected]>
In case cgroup configuration is invalid (some parameters can't be set
etc.), p.manager.Set fails, the error is returned, and then we try to
remove cgroups (by calling p.manager.Destroy) in a defer.

The problem is, the container init is not yet killed (as it is killed in
the caller, i.e. (*linuxContainer).start), so cgroup removal fails like
this:

> time="2020-09-26T07:46:25Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/net_cls,net_prio/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy"
> ...
> time="2020-09-26T07:46:27Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/net_cls,net_prio/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy"

The above is repeated for every controller, and looks quite scary.

To fix, move the init termination to the abovementioned defer.

Do the same for (*setnsProcess).start() for uniformity.

Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit dc42459)
Signed-off-by: Kir Kolyshkin <[email protected]>
When we call terminate(), we kill the process, and wait
returns the error indicating the process was killed.
This is exactly what we expect here, so there is no reason
to treat it as an error.

Before this patch, when a container with invalid cgroup parameters is
started:

> WARN[0000] unable to terminate initProcess               error="signal: killed"
> ERRO[0000] container_linux.go:366: starting container process caused: process_linux.go:495: container init caused: process_linux.go:458: setting cgroup config for procHooks process caused: failed to write "555": open /sys/fs/cgroup/blkio/user.slice/xx33/blkio.weight: permission denied

After:

> ERRO[0000] container_linux.go:366: starting container process caused: process_linux.go:495: container init caused: process_linux.go:458: setting cgroup config for procHooks process caused: failed to write "555": open /sys/fs/cgroup/blkio/user.slice/xx33/blkio.weight: permission denied

I.e. the useless warning is gone.

NOTE this breaks a couple of integration test cases, since they were
expecting a particular message in the second line, and now due to
"signal: killed" removed it's in the first line. Fix those, too.

Signed-off-by: Kir Kolyshkin <[email protected]>
(cherry picked from commit a4d5e8a)
Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin kolyshkin closed this Sep 30, 2020
@kolyshkin kolyshkin reopened this Sep 30, 2020
@kolyshkin kolyshkin force-pushed the master branch 6 times, most recently from e78b8fd to ba0c20e Compare November 17, 2020 02:11
kolyshkin pushed a commit that referenced this pull request Feb 4, 2021
Aleksa Sarai (2):
  VERSION: back to development
  VERSION: release 1.0.0~rc93

Vote: +6 -0 #1
Closes opencontainers#2784
@kolyshkin kolyshkin closed this Jun 1, 2021
kolyshkin added a commit that referenced this pull request May 15, 2023
When a hook has failed, the error message looks like this:

> error running hook: error running hook #1: exit status 1, stdout: ...

The two problems here are:
1. it is impossible to know what kind of hook it was;
2. "error running hook" stuttering;

Change that to

> error running createContainer hook #1: exit status 1, stdout: ...

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit that referenced this pull request May 16, 2023
When a hook has failed, the error message looks like this:

> error running hook: error running hook #1: exit status 1, stdout: ...

The two problems here are:
1. it is impossible to know what kind of hook it was;
2. "error running hook" stuttering;

Change that to

> error running createContainer hook #1: exit status 1, stdout: ...

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit that referenced this pull request May 18, 2023
When a hook has failed, the error message looks like this:

> error running hook: error running hook #1: exit status 1, stdout: ...

The two problems here are:
1. it is impossible to know what kind of hook it was;
2. "error running hook" stuttering;

Change that to

> error running createContainer hook #1: exit status 1, stdout: ...

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit that referenced this pull request Jun 13, 2023
When a hook has failed, the error message looks like this:

> error running hook: error running hook #1: exit status 1, stdout: ...

The two problems here are:
1. it is impossible to know what kind of hook it was;
2. "error running hook" stuttering;

Change that to

> error running createContainer hook #1: exit status 1, stdout: ...

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit that referenced this pull request Aug 11, 2023
When a hook has failed, the error message looks like this:

> error running hook: error running hook #1: exit status 1, stdout: ...

The two problems here are:
1. it is impossible to know what kind of hook it was;
2. "error running hook" stuttering;

Change that to

> error running createContainer hook #1: exit status 1, stdout: ...

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit that referenced this pull request Aug 25, 2023
When a hook has failed, the error message looks like this:

> error running hook: error running hook #1: exit status 1, stdout: ...

The two problems here are:
1. it is impossible to know what kind of hook it was;
2. "error running hook" stuttering;

Change that to

> error running createContainer hook #1: exit status 1, stdout: ...

Signed-off-by: Kir Kolyshkin <[email protected]>
kolyshkin added a commit that referenced this pull request Jun 23, 2024
MAINTAINERS: add Rodrigo Campos

LGTMs: crosbymichael mrunalp dqminh hqhq cyphar AkihiroSuda kolyshkin thaJeztah lifubang
Vote: +8 -0 #1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant