What happened:
the CRI-O CI tests have been in a bad shape recently. In debugging, I have found that the kubelet logs are filled with:
Timed out while waiting for StopUnit(kubepods-besteffort-pod867fd309_03ba_4715_a044_29393f495cea.slice) completion signal from dbus. Continuing...
grep 'Timed out' /tmp/kubelet.log | wc -l
352562
AFAICT, this is from a combination of bumping to go 1.14.4, and a03db63
as I ran two different PRs that dropped each of these commits, and there were no similar problems.
I am fairly certain this is NOT a problem with kubernetes directly, but rather some odd interaction between go 1.14 and either libcontainer, go-systemd, or godbus. but I figure it can be opened here to start the conversation
What you expected to happen:
StopUnit should not time out
How to reproduce it (as minimally and precisely as possible):
run a node with cgroupv1
build hyperkube with go 1.14.4 (as is now required)
run hack/local-up.sh
create and remove a pod and see the cgroup be failed to be torn down
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version):
master
- Cloud provider or hardware configuration:
aws
- OS (e.g:
cat /etc/os-release):
ID=fedora
VERSION_ID=30
VERSION_CODENAME=""
PLATFORM_ID="platform:f30"
PRETTY_NAME="Fedora 30 (Cloud Edition)"
ANSI_COLOR="0;34"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:30"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f30/system-administrators-guide/"
SUPPORT_URL="https://fedoraproject.org/wiki/Communicating_and_getting_help"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=30
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=30
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Cloud Edition"
VARIANT_ID=cloud
though this also happens on our RHEL 7 boxes
uname -a
Linux ip-172-18-11-215.ec2.internal 5.6.13-100.fc30.x86_64 #1 SMP Fri May 15 00:36:06 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
- Install tools:
build locally
- Network plugin and version (if this is a network-related bug):
- Others:
What happened:
the CRI-O CI tests have been in a bad shape recently. In debugging, I have found that the kubelet logs are filled with:
AFAICT, this is from a combination of bumping to go 1.14.4, and a03db63
as I ran two different PRs that dropped each of these commits, and there were no similar problems.
I am fairly certain this is NOT a problem with kubernetes directly, but rather some odd interaction between go 1.14 and either libcontainer, go-systemd, or godbus. but I figure it can be opened here to start the conversation
What you expected to happen:
StopUnit should not time out
How to reproduce it (as minimally and precisely as possible):
run a node with cgroupv1
build hyperkube with go 1.14.4 (as is now required)
run
hack/local-up.shcreate and remove a pod and see the cgroup be failed to be torn down
Anything else we need to know?:
Environment:
kubectl version):master
aws
cat /etc/os-release):though this also happens on our RHEL 7 boxes
uname -a):build locally