Description
During switch bootup supervisorctl is called lot of times from different SONiC containers to start daemons. It consumes a lot of CPU. It makes containers to start slower and has impact on cold/fast/warm boot timings.
Steps to reproduce the issue:
No specific steps, just perform any kind of reload/reboot and start some profiling tool (bootchart, perf&flamegraphs)
Describe the results you received:
supervisorctl is a very CPU intensive utility, however it is used everywhere, causing slow start.
Fast boot suffers because platform SDK may not be able to perfrom switch init and reconfiguration fast enough if other CPU intensive tasks are running in parallel.
Fast/Warm boot suffers because switch control plane downtime is increased.
Describe the results you expected:
More new containers will use supervisorctl to start daemons. To my understanding the job of supervisor is to fork the daemon and wait - which is blocking operation, so no intensive tasks are performed. However, supervisorctl (mainly) and supervisord take a lot of CPU during startup. Ideally there should not be high CPU usage for those processes.
Additional information you deem important (e.g. issue happens only occasionally):
This is very platform specific, depending on platform CPU you may have different results.
Output of show version:
The version is debug version compiled with SONIC_PROFILING_ON=y and '-fno-omit-frame-pointer':
Attached is system perf recording and generated flamegraph during bootup. Perf was started at /etc/rc.local phase with command:
perf_4.9 record -F 99 -a -g -o /home/admin/perf -- sleep 100 &
system-perf.svg.gz
We can see a lot of supervisorctl samples collected, more than any critical SONiC component, like SDK, syncd, orchagent or redis-server.
Bootchart plot (https://elinux.org/Bootchart)

We can see lot of supervisor invocations during SDK start and configuration.
SONiC Software Version: SONiC.201911.0-8367dfeb
Distribution: Debian 9.12
Kernel: 4.9.0-11-2-amd64
Build commit: 8367dfeb
Build date: Wed May 6 16:14:29 UTC 2020
Built by: stepanb@r-build-sonic02
Platform: x86_64-mlnx_msn2700-r0
HwSKU: Mellanox-SN2700-D48C8
ASIC: mellanox
Serial Number: MT1822K07823
Uptime: 10:36:48 up 8 min, 1 user, load average: 1.61, 1.69, 0.99
Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-platform-monitor-dbg 201911.0-8367dfeb af5e7e595e0c 641MB
docker-platform-monitor-dbg latest af5e7e595e0c 641MB
docker-platform-monitor 201911.0-8367dfeb af5e7e595e0c 641MB
docker-platform-monitor latest af5e7e595e0c 641MB
docker-sflow-dbg 201911.0-8367dfeb 7a794dfac076 332MB
docker-sflow-dbg latest 7a794dfac076 332MB
docker-sflow 201911.0-8367dfeb 7a794dfac076 332MB
docker-sflow latest 7a794dfac076 332MB
docker-fpm-frr-dbg 201911.0-8367dfeb 231d4d8e8fbd 385MB
docker-fpm-frr-dbg latest 231d4d8e8fbd 385MB
docker-fpm-frr 201911.0-8367dfeb 231d4d8e8fbd 385MB
docker-fpm-frr latest 231d4d8e8fbd 385MB
docker-lldp-sv2-dbg 201911.0-8367dfeb 63600bb51fad 326MB
docker-lldp-sv2-dbg latest 63600bb51fad 326MB
docker-lldp-sv2 201911.0-8367dfeb 63600bb51fad 326MB
docker-lldp-sv2 latest 63600bb51fad 326MB
docker-orchagent-dbg 201911.0-8367dfeb 1f2d3cbf41c4 376MB
docker-orchagent-dbg latest 1f2d3cbf41c4 376MB
docker-orchagent 201911.0-8367dfeb 1f2d3cbf41c4 376MB
docker-orchagent latest 1f2d3cbf41c4 376MB
docker-snmp-sv2-dbg 201911.0-8367dfeb 36b4cb692870 357MB
docker-snmp-sv2-dbg latest 36b4cb692870 357MB
docker-snmp-sv2 201911.0-8367dfeb 36b4cb692870 357MB
docker-snmp-sv2 latest 36b4cb692870 357MB
docker-nat-dbg 201911.0-8367dfeb 8b9e98228c50 362MB
docker-nat-dbg latest 8b9e98228c50 362MB
docker-nat 201911.0-8367dfeb 8b9e98228c50 362MB
docker-nat latest 8b9e98228c50 362MB
docker-sonic-mgmt-framework-dbg 201911.0-8367dfeb f6f222fbb114 443MB
docker-sonic-mgmt-framework-dbg latest f6f222fbb114 443MB
docker-sonic-mgmt-framework 201911.0-8367dfeb f6f222fbb114 443MB
docker-sonic-mgmt-framework latest f6f222fbb114 443MB
docker-teamd-dbg 201911.0-8367dfeb aea02a411903 360MB
docker-teamd-dbg latest aea02a411903 360MB
docker-teamd 201911.0-8367dfeb aea02a411903 360MB
docker-teamd latest aea02a411903 360MB
docker-syncd-mlnx-dbg 201911.0-8367dfeb a6067b546dec 416MB
docker-syncd-mlnx-dbg latest a6067b546dec 416MB
docker-syncd-mlnx 201911.0-8367dfeb a6067b546dec 416MB
docker-syncd-mlnx latest a6067b546dec 416MB
docker-sonic-telemetry-dbg 201911.0-8367dfeb 2653e5a6978d 369MB
docker-sonic-telemetry-dbg latest 2653e5a6978d 369MB
docker-sonic-telemetry 201911.0-8367dfeb 2653e5a6978d 369MB
docker-sonic-telemetry latest 2653e5a6978d 369MB
docker-database-dbg 201911.0-8367dfeb e224f6c86f35 303MB
docker-database-dbg latest e224f6c86f35 303MB
docker-database 201911.0-8367dfeb e224f6c86f35 303MB
docker-database latest e224f6c86f35 303MB
docker-router-advertiser-dbg 201911.0-8367dfeb eddcecf2bd70 306MB
docker-router-advertiser-dbg latest eddcecf2bd70 306MB
docker-router-advertiser 201911.0-8367dfeb eddcecf2bd70 306MB
docker-router-advertiser latest eddcecf2bd70 306MB
docker-dhcp-relay-dbg 201911.0-8367dfeb e1cc48d4ce38 312MB
docker-dhcp-relay-dbg latest e1cc48d4ce38 312MB
docker-dhcp-relay 201911.0-8367dfeb e1cc48d4ce38 312MB
docker-dhcp-relay latest e1cc48d4ce38 312MB
Attach debug file sudo generate_dump:
sonic_dump.tar.gz
Description
During switch bootup supervisorctl is called lot of times from different SONiC containers to start daemons. It consumes a lot of CPU. It makes containers to start slower and has impact on cold/fast/warm boot timings.
Steps to reproduce the issue:
No specific steps, just perform any kind of reload/reboot and start some profiling tool (bootchart, perf&flamegraphs)
Describe the results you received:
supervisorctl is a very CPU intensive utility, however it is used everywhere, causing slow start.
Fast boot suffers because platform SDK may not be able to perfrom switch init and reconfiguration fast enough if other CPU intensive tasks are running in parallel.
Fast/Warm boot suffers because switch control plane downtime is increased.
Describe the results you expected:
More new containers will use supervisorctl to start daemons. To my understanding the job of supervisor is to fork the daemon and wait - which is blocking operation, so no intensive tasks are performed. However, supervisorctl (mainly) and supervisord take a lot of CPU during startup. Ideally there should not be high CPU usage for those processes.
Additional information you deem important (e.g. issue happens only occasionally):
This is very platform specific, depending on platform CPU you may have different results.
Output of
show version:The version is debug version compiled with SONIC_PROFILING_ON=y and '-fno-omit-frame-pointer':
Attached is system perf recording and generated flamegraph during bootup. Perf was started at /etc/rc.local phase with command:
perf_4.9 record -F 99 -a -g -o /home/admin/perf -- sleep 100 &system-perf.svg.gz
We can see a lot of supervisorctl samples collected, more than any critical SONiC component, like SDK, syncd, orchagent or redis-server.
Bootchart plot (https://elinux.org/Bootchart)

We can see lot of supervisor invocations during SDK start and configuration.
Attach debug file
sudo generate_dump:sonic_dump.tar.gz