Skip to content

ci: log the status of GitHub Actions' VM at the end#4990

Merged
mxpv merged 1 commit intocontainerd:masterfrom
kzys:host-status
Mar 18, 2021
Merged

ci: log the status of GitHub Actions' VM at the end#4990
mxpv merged 1 commit intocontainerd:masterfrom
kzys:host-status

Conversation

@kzys
Copy link
Copy Markdown
Member

@kzys kzys commented Feb 2, 2021

To investigate issues like #4969, it would be helpful to understand
the status of the VM at the end.

Signed-off-by: Kazuyoshi Kato [email protected]

@k8s-ci-robot
Copy link
Copy Markdown

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Feb 2, 2021

Build succeeded.

@kzys
Copy link
Copy Markdown
Member Author

kzys commented Feb 2, 2021

https://github.com/containerd/containerd/pull/4990/checks?check_run_id=1817209925 has reproduced #4969 successfully.

--- FAIL: TestRwLoop (0.06s)
Error:     losetup_linux_test.go:93: write /dev/loop1: no space left on device
FAIL
FAIL	github.com/containerd/containerd/mount	1.486s
ok  	github.com/containerd/containerd/snapshots/btrfs	23.411s
ok  	github.com/containerd/containerd/snapshots/devmapper	42.035s
ok  	github.com/containerd/containerd/snapshots/devmapper/dmsetup	0.505s
ok  	github.com/containerd/containerd/snapshots/native	10.711s
make: *** [root-test] Error 1
ok  	github.com/containerd/containerd/snapshots/overlay	17.432s
FAIL
Error: Makefile:177: recipe for target 'root-test' failed
Error: Process completed with exit code 2.

But the output from the newly-added Host Status section doesn't explain why we had "no space left on device".

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=3542256k,nr_inodes=885564,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=712132k,mode=755)
/dev/sdb1 on / type ext4 (rw,relatime,discard)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=24,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=14402)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
configfs on /sys/kernel/config type configfs (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
/dev/sdb15 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro,discard)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
/dev/sda1 on /mnt type ext4 (rw,relatime,x-systemd.requires=cloud-init.service)
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
Filesystem     1K-blocks     Used Available Use% Mounted on
udev             3542256        0   3542256   0% /dev
tmpfs             712132      680    711452   1% /run
/dev/sdb1       87218124 67193068  20008672  78% /
tmpfs            3560644        8   3560636   1% /dev/shm
tmpfs               5120        0      5120   0% /run/lock
tmpfs            3560644        0   3560644   0% /sys/fs/cgroup
/dev/sdb15        106858     3696    103162   4% /boot/efi
/dev/sda1       14382088  4235296   9396508  32% /mnt

@kzys
Copy link
Copy Markdown
Member Author

kzys commented Feb 2, 2021

I can reproduce the issue on my Amazon Linux 2 box.

[pid 62105] openat(AT_FDCWD, "/tmp/losetupTestRwLoop956864572", O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0600 <unfinished ...>
[pid 62105] <... openat resumed> )      = 5
[pid 62105] epoll_ctl(4, EPOLL_CTL_ADD, 5, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=565341432, u64=139891945139448}}) = -1 EPERM (Operation not permitted)
[pid 62105] epoll_ctl(4, EPOLL_CTL_DEL, 5, 0xc000116b9c) = -1 EPERM (Operation not permitted)
[pid 62105] ftruncate(5, 512)           = 0
[pid 62105] close(5 <unfinished ...>
[pid 62105] <... close resumed> )       = 0
[pid 62105] openat(AT_FDCWD, "/dev/loop-control", O_RDWR|O_CLOEXEC) = 5
[pid 62105] epoll_ctl(4, EPOLL_CTL_ADD, 5, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=565341432, u64=139891945139448}}) = -1 EPERM (Operation not permitted)
[pid 62105] epoll_ctl(4, EPOLL_CTL_DEL, 5, 0xc000116bec) = -1 EPERM (Operation not permitted)
[pid 62105] ioctl(5, LOOP_CTL_GET_FREE <unfinished ...>
[pid 62105] <... ioctl resumed> )       = 0
[pid 62105] close(5)                    = 0
[pid 62105] openat(AT_FDCWD, "/tmp/losetupTestRwLoop956864572", O_RDWR|O_CLOEXEC) = 5
[pid 62105] epoll_ctl(4, EPOLL_CTL_ADD, 5, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=565341432, u64=139891945139448}}) = -1 EPERM (Operation not permitted)
[pid 62105] epoll_ctl(4, EPOLL_CTL_DEL, 5, 0xc000116a94 <unfinished ...>
[pid 62105] <... epoll_ctl resumed> )   = -1 EPERM (Operation not permitted)
[pid 62105] openat(AT_FDCWD, "/dev/loop0", O_RDWR|O_CLOEXEC) = 6
[pid 62105] epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=565341432, u64=139891945139448}}) = -1 EPERM (Operation not permitted)
[pid 62105] epoll_ctl(4, EPOLL_CTL_DEL, 6, 0xc000116a94) = -1 EPERM (Operation not permitted)
[pid 62105] ioctl(6, LOOP_SET_FD, 5 <unfinished ...>
[pid 62105] <... ioctl resumed> )       = 0
[pid 62105] ioctl(6, LOOP_SET_STATUS64, {lo_offset=0, lo_number=0, lo_flags=LO_FLAGS_AUTOCLEAR, lo_file_name="/tmp/losetupTestRwLoop956864572", ...} <unfinished ...>
[pid 61851] sched_yield( <unfinished ...>
[pid 61851] <... sched_yield resumed> ) = 0
[pid 61851] sched_yield()               = 0
[pid 61851] sched_yield( <unfinished ...>
[pid 61851] <... sched_yield resumed> ) = 0
[pid 61851] sched_yield()               = 0
[pid 61851] sched_yield()               = 0
[pid 62105] <... ioctl resumed> )       = 0
[pid 62105] close(6 <unfinished ...>
[pid 61850] sched_yield( <unfinished ...>
[pid 61850] <... sched_yield resumed> ) = 0
[pid 61850] sched_yield( <unfinished ...>
[pid 61850] <... sched_yield resumed> ) = 0
[pid 61850] sched_yield()               = 0
[pid 61850] sched_yield()               = 0
[pid 61850] sched_yield()               = 0
[pid 61850] sched_yield()               = 0
[pid 61850] sched_yield()               = 0
[pid 62105] <... close resumed> )       = 0
[pid 62105] close(5 <unfinished ...>
[pid 62105] <... close resumed> )       = 0
[pid 62105] openat(AT_FDCWD, "/dev/loop0", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0777) = 5
[pid 62105] epoll_ctl(4, EPOLL_CTL_ADD, 5, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=565341432, u64=139891945139448}}) = -1 EPERM (Operation not permitted)
[pid 62105] epoll_ctl(4, EPOLL_CTL_DEL, 5, 0xc000116ccc <unfinished ...>
[pid 62105] <... epoll_ctl resumed> )   = -1 EPERM (Operation not permitted)
[pid 62105] write(5, "randomdata", 10)  = -1 ENOSPC (No space left on device)
[pid 62105] close(5)                    = 0

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Feb 2, 2021

Build succeeded.

@kzys
Copy link
Copy Markdown
Member Author

kzys commented Feb 2, 2021

// setupLoop looks for (and possibly creates) a free loop device, and
// then attaches backingFile to it.
//
// When autoclear is true, caller should take care to close it when
// done with the loop device. The loop device file handle keeps
// loFlagsAutoclear in effect and we rely on it to clean up the loop
// device. If caller closes the file handle after mounting the device,
// kernel will clear the loop device after it is umounted. Otherwise
// the loop device is cleared when the file handle is closed.
//
// When autoclear is false, caller should be responsible to remove
// the loop device when done with it.
//
// Upon success, the file handle to the loop device is returned.
func setupLoop(backingFile string, param LoopParams) (string, error) {

This function's autoclear handling looks suspicious. The comment says that the caller should close the file handle, but it returns the file path. So all of the associated file handles would be closed when the function returns.

Comment thread .github/workflows/ci.yml Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you fill about if: failure(), so we execute this step only if something went wrong?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should remain always(). Useful for comparing failing VMs from passing VMs.

@kzys kzys marked this pull request as ready for review February 19, 2021 18:58
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Feb 19, 2021

Build succeeded.

@kzys
Copy link
Copy Markdown
Member Author

kzys commented Feb 19, 2021

The CI failure seems coming from 429 Too Many Requests.

     default:     lease_test.go:61: failed to copy: httpReadSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/library/busybox/manifests/sha256:29f5d56d12684887bdfa50dcd29fc31eea4aaf4ad3bec43daf19026a7ce69912: 429 Too Many Requests - Server message: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
    default: --- FAIL: TestLeaseResources (0.65s)

@estesp
Copy link
Copy Markdown
Member

estesp commented Feb 19, 2021

Re: the 429: I think @thaJeztah was looking into why actions inside the vagrant/fedora-based VM (on macos instances) are not whitelisted like everything else.

Copy link
Copy Markdown
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah
Copy link
Copy Markdown
Member

Re: the 429: I think @thaJeztah was looking into why actions inside the vagrant/fedora-based VM (on macos instances) are not whitelisted like everything else.

Let me ask if there's progress on that

@thaJeztah
Copy link
Copy Markdown
Member

@estesp (on my phone, so lazy); can we get the IP address of the machine where the rate limit occurred? I asked, and the macOS machines should have been added some time ago (so Hub team was wondering if GitHub perhaps added new ranges for the macOs machines)

To investigate issues like containerd#4969, it would be helpful to understand
the status of the VM at the end.

Signed-off-by: Kazuyoshi Kato <[email protected]>
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Mar 12, 2021

Build succeeded.

@kzys
Copy link
Copy Markdown
Member Author

kzys commented Mar 17, 2021

@mxpv or @AkihiroSuda can you folks take a look?

@mxpv mxpv merged commit 2d5f9bf into containerd:master Mar 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants