Skip to content

Very strict checking of subsystems' existence while loading cgroup  #58

@ChrsMark

Description

@ChrsMark

Trying to load the Cgroup of a running container using

func Load(hierarchy Hierarchy, path Path) (Cgroup, error) {
I am getting an error:

failed to load cgroup at path /docker/d840ada2fb82be4259d6875453e735f7ec19fb5ee4205d4d11445a77dce614e8: cgroups: cgroup deleted

The code that produces this error is:

import (
	"fmt"
	cntdcg "github.com/containerd/cgroups"
)

type Cgroup struct {
	cgroup cntdcg.Cgroup
	Cpath  string
}

// Load loads an existing cgroup using the specified cgroup path, returning a new
// Cgroup instance. If the cgroup does not exist, an error will be returned.
func Load(cpath string) (*Cgroup, error) {
	if cpath[0] != '/' {
		return nil, fmt.Errorf("cgroup path %s without leading /", cpath)
	}
	cg, err := cntdcg.Load(cntdcg.V1, cntdcg.StaticPath(cpath))
	if err != nil {
		return nil, fmt.Errorf("failed to load cgroup at path %s: %v", cpath, err)
	}
	return &Cgroup{cgroup: cg, Cpath: cpath}, nil
}

func main() {
	id := "d840ada2fb82be4259d6875453e735f7ec19fb5ee4205d4d11445a77dce614e8"
	path := "/docker/" + id
	cgroup, err := Load(path)
	if err != nil {
		fmt.Println(err)
	} else {
		fmt.Println(cgroup.cgroup)
	}
}

Diving in I notice that the cgroup path actually does not exist under rdma subsystem while on others, like cpu, exists:

$ ls /sys/fs/cgroup/cpu/docker/ | grep d840ada2fb82
d840ada2fb82be4259d6875453e735f7ec19fb5ee4205d4d11445a77dce614e8
$ ls /sys/fs/cgroup/rdma/docker/ | grep d840ada2fb82

Moreover trying with a container that is running for long enough period seems ok:

$ ls /sys/fs/cgroup/rdma/docker/ | grep 1530cdbc2ef5
1530cdbc2ef503601aa5721a05434d3b2bd5f271688bbdf225309056cd65810c
$ go run main.go
&{0x624e80 [0xc42000c240 0xc420010630 0xc420010640 0xc420010650 0xc420010660 0xc420010670 0xc420010680 0xc420010690 0xc4200106a0 0xc4200106b0 0xc4200106c0 0xc4200106d0 0xc4200106e0 0xc42009b020] {0 0} <nil>}

So it looks to me like the rdma subsystem is taking time to be updated for some reason.
The info of the system I'm trying on:

$ uname -r
4.15.0-36-generic
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.1 LTS
Release:        18.04
Codename:       bionic
$ docker --version
Docker version 18.06.1-ce, build e68fc7a

Finally, is it something that Load needs to be stuck on? How about being more lenient on

if _, err := os.Lstat(s.Path(p)); err != nil {

and just bypassing the subsystems that were not confirmed and instead of returning an error just removing the subsystem from the returning subsystems' list of the struct
subsystems: subsystems,
?

Would this be acceptable? Patching this on my local source-code of the library unblocks me, so will a PR with this change be welcomed?

cc: @crosbymichael

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions