Skip to content

Spawning PTY processes is many times slower on Docker 18.09 #502

@spantaleev

Description

@spantaleev
  • This is a bug report
  • This is a feature request
  • I searched existing issues before opening this one

Expected behavior

Spawning processes which use a pseudo TTY is consistently fast.

Actual behavior

Sometimes, spawning processes that use a pseudo TTY inside a container is many times slower than usual.

The first time I've encountered this problem (on a really slow machine), the time to spawn such a process in a container jumped from something reasonable (~0.5 seconds) to something much slower (8-10 seconds).

On faster machines, it's still a reproducible problem (see below). It's much less noticeable as we're talking about smaller numbers, but there's at least a factor of 4-5x slowdown (process spawning goes from taking 0.1 seconds to 0.5+ seconds).

Steps to reproduce the behavior

As described in Additional environment details (below), I've managed to reproduce it on various CentOS 7 machines (I've tried Hetzner Cloud servers and Digital Ocean droplets and it's reproducible on both).

Before describing the steps, let me first define a few files for a "benchmark" Docker image that would be used to illustrate the performance problem.

bench.py:

print('Starting')

import pexpect
import time

s = time.time()
print('Spawn..')
# Well, you would usually do this with an interactive program.
# This is just for illustrative purposes.
pexpect.spawn('whoami')
print(time.time() - s)

bench.sh:

#!/bin/bash

# Strace is also available in the image, so running with it is possible
#time docker run -it --rm --security-opt seccomp:unconfined docker-benchmark strace -r python bench.py

time docker run -it --rm docker-benchmark python bench.py

Steps to reproduce on a clean CentOS 7 install:

# Make sure we're up to date and have the latest Docker (18.09)
yum update -y
yum install -y wget
wget https://download.docker.com/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo
yum install -y docker-ce
systemctl start docker # just start, do not enable (!) on startup

mkdir docker-benchmark
cd docker-benchmark
# Copy over the necessary files (bench.sh, bench.py, Dockerfile)

docker build -t docker-benchmark -f Dockerfile .

# Run this a couple of times. It should be fast (e.g. 0.1 sec.)
./bench.sh

# Restarting will likely cause performance to degrade
systemctl restart docker

# Try benchmarking again now. It should be consistently slower (e.g. 0.4+ sec)
./bench.sh

reboot -f

cd docker-benchmark
systemctl start docker

# Run this a couple of times. It's likely fast again.
./bench.sh

# Restarting will likely cause performance to degrade
systemctl restart docker

# Try benchmarking again now. It should be consistently slower.
./bench.sh


# Downgrade Docker (18.09 -> 18.06)
systemctl stop docker
yum remove -y docker-ce docker-ce-cli
yum install -y docker-ce-18.06.1.ce-3.el7

systemctl start docker

# Try benchmarking again now. It should be fast.
# Restarting Docker, rebooting or whatever, will not slow it down.
./bench.sh

Output of docker version:

Client:
 Version:           18.09.0
 API version:       1.39
 Go version:        go1.10.4
 Git commit:        4d60db4
 Built:             Wed Nov  7 00:48:22 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.0
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.4
  Git commit:       4d60db4
  Built:            Wed Nov  7 00:19:08 2018
  OS/Arch:          linux/amd64
  Experimental:     false

Output of docker info:

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 5
Server Version: 18.09.0
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: c4446665cb9c30056f4998ed953e6d4ff22c7c39
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-862.14.4.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 991.7MiB
Name: centos-s-1vcpu-1gb-fra1-01
ID: HJ5J:T65L:WR5U:W7GA:X3E7:NSEQ:HBIE:F6DF:B47K:C4OG:K33Q:I3XY
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

Additional environment details (AWS, VirtualBox, physical, etc.)

I have first encountered this problem on a CentOS 7.5 KVM machine running on Proxmox.

I have since also managed to reproduce it using CentOS 7 on:

  • Hetzner Cloud VM servers (€2.99 server in the Falkenstein region)
  • Digital Ocean droplets ($5 droplet in the Frankfurt region)

As the flow for reproducing in Steps to reproduce on a clean CentOS 7 install (above) says, this is only an issue with Docker 18.09. Downgrading to 18.06 makes the issue go away.

I have tried to reproduce the problem on Ubuntu 18.04 (LTS), but I haven't managed to.
Or maybe I have.. Actually the performance I've managed to achieve on Ubuntu 18.04 matches the slow benchmarks from Centos 7 + Docker 18.09. Even if I downgrade Docker to 18.06 on Ubuntu 18.04, it's still just as slow.. In summary, it's always slow on Ubuntu, regardless of the Docker version.. So maybe there is a problem there too.

Additional debugging output (strace):

bench.sh (above) also contains a line to run the same benchmark with strace -r.

I've received the following results (last part is most interesting as it shows where the slowdown is).

Slowness at SIGCHLD:

    ... more stuff above ...
     0.000026 stat("/dev/pts/1", {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
     0.000033 open("/dev/pts/1", O_RDWR|O_NOCTTY) = 6
     0.000035 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f9d30ff89d0) = 8
     0.000232 close(6)                  = 0
     0.000276 ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
     0.000066 ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
     0.000117 fstat(5, {st_mode=S_IFCHR|0666, st_rdev=makedev(5, 2), ...}) = 0
     0.000039 ioctl(5, TCGETS, {B38400 opost isig icanon echo ...}) = 0
     0.000031 fstat(5, {st_mode=S_IFCHR|0666, st_rdev=makedev(5, 2), ...}) = 0
     0.000033 ioctl(5, TCGETS, {B38400 opost isig icanon echo ...}) = 0
     0.000062 lseek(5, 0, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
     0.000056 lseek(5, 0, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
     0.000114 close(4)                  = 0
     0.000039 read(3, "", 4096)         = 0
     0.648181 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8, si_uid=0, si_status=0, si_utime=27, si_stime=37} ---
     0.000037 close(3)                  = 0
     0.000102 ioctl(5, TIOCSWINSZ, {ws_row=24, ws_col=80, ws_xpixel=0, ws_ypixel=0}) = 0
     0.000125 close(5)                  = 0
     0.000050 select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=100000}) = 0 (Timeout)
     0.100275 wait4(8, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 8
     0.000154 write(1, "0.7546336650848389\n", 190.7546336650848389
) = 19
     0.000185 rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f9d3067e0c0}, {sa_handler=0x7f9d30a96af0, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f9d3067e0c0}, 8) = 0
     0.006374 sigaltstack(NULL, {ss_sp=0x55aca4608c90, ss_flags=0, ss_size=8192}) = 0
     0.000051 sigaltstack({ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=0}, NULL) = 0
     0.000242 exit_group(0)             = ?
     0.001359 +++ exited with 0 +++

or slowness at close():

    ... more stuff above ...
     0.000027 stat("/dev/pts/1", {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
     0.000032 open("/dev/pts/1", O_RDWR|O_NOCTTY) = 6
     0.000038 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f35d72f39d0) = 8
     0.000235 close(6)                  = 0
     0.000289 ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
     0.000063 ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
     0.000114 fstat(5, {st_mode=S_IFCHR|0666, st_rdev=makedev(5, 2), ...}) = 0
     0.000040 ioctl(5, TCGETS, {B38400 opost isig icanon echo ...}) = 0
     0.000031 fstat(5, {st_mode=S_IFCHR|0666, st_rdev=makedev(5, 2), ...}) = 0
     0.000034 ioctl(5, TCGETS, {B38400 opost isig icanon echo ...}) = 0
     0.000052 lseek(5, 0, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
     0.000057 lseek(5, 0, SEEK_CUR)     = -1 ESPIPE (Illegal seek)
     0.000083 close(4)                  = 0
     0.000035 read(3, "", 4096)         = 0
     0.647527 close(3)                  = 0
     0.000104 ioctl(5, TIOCSWINSZ, {ws_row=24, ws_col=80, ws_xpixel=0, ws_ypixel=0}) = 0
     0.000129 close(5)                  = 0
     0.000047 select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=100000}) = ? ERESTARTNOHAND (To be restarted if no handler)
     0.000495 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8, si_uid=0, si_status=0, si_utime=24, si_stime=39} ---
     0.000018 select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=99545}) = 0 (Timeout)
     0.099814 wait4(8, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 8
     0.000204 write(1, "0.7540490627288818\n", 190.7540490627288818
) = 19
     0.000292 rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f35d69790c0}, {sa_handler=0x7f35d6d91af0, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f35d69790c0}, 8) = 0
     0.008504 sigaltstack(NULL, {ss_sp=0x5564cd06cc90, ss_flags=0, ss_size=8192}) = 0
     0.000052 sigaltstack({ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=0}, NULL) = 0
     0.000216 exit_group(0)             = ?
     0.001180 +++ exited with 0 +++

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions