TL;DR
$ ulimit -n -u
open files (-n) 1024
max user processes (-u) 62435
$ docker run --rm ubuntu bash -c "ulimit -n -u"
open files (-n) 1048576
max user processes (-u) unlimited
Problem
ulimit, being an archaic resource management mechanism (see this pdf, slide 18), is not completely obsoleted by cgroup controllers, and it is still an essential part of system administration.
Default ulimits for a new container are derived from those of dockerd containerd itself. They are set in containerd.service systemd unit file to unlimited values:
$ grep ^Limit /lib/systemd/system/containerd.service
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
This is required for containerd itself, but is way too generous for containers it runs. For comparison, ulimits for a user (including root) on the host system are pretty modest (this is an example from Ubuntu 18.04):
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 62435
max locked memory (kbytes, -l) 16384
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 62435
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
This can create a number of problems, such as container abusing system resources (e.g. DoS attacks). In general, cgroup limits should be used to prevent those, yet I think ulimits should be set to a saner values.
In particular, RLIMIT_NOFILE, a number of open files limit, which is set to 2^20 (aka 1048576), causes a slowdown in a number of programs, as they use the upper limit value to iterate over all potentially opened file descriptors, closing those (or setting CLOEXEC bit) before every fork/exec. I am aware of the following cases:
Attacking those one by one proved complicated and not very fruitful, as some software is obsoleted, some is hard to fix, etc. In addition, the above list is not a concise one, so there might be more cases like this we're not aware of.
Workaround
It is sufficient to add something like the following to dockerd configuration, and do systemctl reload docker:
kir@kd:~/go/src/github.com/docker/docker$ cat /etc/docker/daemon.json-ulimits
{
"default-ulimits": {
"nofile": {
"Name": "nofile",
"Hard": 1024,
"Soft": 1024
},
"nproc": {
"Name": "nproc",
"Soft": 65536,
"Hard": 65536
}
}
}
(side note: yes, the format is ugly but that's what we have)
Proposed solution
Add built-in (e.g. compiled-in) default ulimits for RLIMIT_NOFILE and RLIMIT_NPROC, so that, unless set from daemon.json or dockerd cli, those are being applied to all containers.
TL;DR
Problem
ulimit, being an archaic resource management mechanism (see this pdf, slide 18), is not completely obsoleted by cgroup controllers, and it is still an essential part of system administration.
Default ulimits for a new container are derived from those of
dockerdcontainerd itself. They are set incontainerd.servicesystemd unit file tounlimitedvalues:This is required for containerd itself, but is way too generous for containers it runs. For comparison, ulimits for a user (including root) on the host system are pretty modest (this is an example from Ubuntu 18.04):
This can create a number of problems, such as container abusing system resources (e.g. DoS attacks). In general, cgroup limits should be used to prevent those, yet I think ulimits should be set to a saner values.
In particular,
RLIMIT_NOFILE, a number of open files limit, which is set to 2^20 (aka 1048576), causes a slowdown in a number of programs, as they use the upper limit value to iterate over all potentially opened file descriptors, closing those (or setting CLOEXEC bit) before every fork/exec. I am aware of the following cases:Attacking those one by one proved complicated and not very fruitful, as some software is obsoleted, some is hard to fix, etc. In addition, the above list is not a concise one, so there might be more cases like this we're not aware of.
Workaround
It is sufficient to add something like the following to dockerd configuration, and do
systemctl reload docker:(side note: yes, the format is ugly but that's what we have)
Proposed solution
Add built-in (e.g. compiled-in) default ulimits for
RLIMIT_NOFILEandRLIMIT_NPROC, so that, unless set from daemon.json or dockerd cli, those are being applied to all containers.