Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
I am trying to build a bigger container image that contains all my usual desktop/applications, and use it with toolbox. With a "fairly large" image, podman create already takes some 30s, and once I add TeXlive to the image, it never finishes any more and eventually kills the machine.
I stripped off the numerous toolbox options/layers and reduced that to a podman command. The crucial option is --userns=keep-id, which sets off some storage-chown-by-maps process.
Steps to reproduce the issue:
- This is the "fairly large" image:
podman pull ghcr.io/martinpitt/swaypod:latest
time podman create --userns=keep-id ghcr.io/martinpitt/swaypod:latest
- This is the image that adds TeXlive (which makes it a few hundred MB larger):
podman pull ghcr.io/martinpitt/swaypod:allpkgs
time podman create --userns=keep-id ghcr.io/martinpitt/swaypod:allpkgs
Describe the results you received:
Step 1 takes 4 s on a Fedora 37 cloud VM (2 CPUs, 4 GiB RAM) with the default btrfs. On a standard RHEL 9.2 VM with XFS and on my laptop's Fedora 37 VM with /home being on ext4, it takes about 20 seconds. In top I see a process called "exe" which is taking 100% CPU:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
972 admin 20 0 1351936 65172 28028 S 96.0 1.7 0:12.33 exe
That is really this:
admin 1972 95.0 1.3 1351680 49344 pts/0 Sl+ 04:04 0:01 storage-chown-by-maps /home/admin/.local/share/containers/storage/overlay/3cc2d72c07248c18a9185b6a5bba0e7932b0ce5c26dbc763e476eb50c2a7ea94/merged
With the larger image in step 2, the Fedora 37 btrfs VM takes merely 6s. However, both on the RHEL 9.2 XFS VM as well as my ext4 real-iron Fedora 37 laptop, the storage-chown-by-maps process never ends. After maybe half a minute it kills the VM (ssh dead, cannot log into the virsh console either), and my laptop becomes really sluggish, I cannot even start top any more. Trying to kill -9 or even sudo kill -9 (!) that storage-chown-by-maps does not work either, it's just unkillable.
Describe the results you expected:
The storage-chown-by-maps process should finish eventually, but ideally reasonably fast. This is more or less a glorified chown -R, no? that shouldn't take more than a few seconds.
Additional information you deem important (e.g. issue happens only occasionally): 100% reproducible, also in a synthetic cloud instance.
Output of podman version:
From Fedora 37:
Client: Podman Engine
Version: 4.3.1
API Version: 4.3.1
Go Version: go1.19.2
Built: Fri Nov 11 16:01:27 2022
OS/Arch: linux/amd64
current RHEL 9.2 also has podman 4.3.
Output of podman info:
host:
arch: amd64
buildahVersion: 1.28.0
cgroupControllers:
- memory
- pids
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon-2.1.5-1.fc37.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.1.5, commit: '
cpuUtilization:
idlePercent: 67.88
systemPercent: 6.19
userPercent: 25.92
cpus: 8
distribution:
distribution: fedora
version: "37"
eventLogger: journald
hostname: abakus
idMappings:
gidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 524288
size: 65536
uidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 524288
size: 65536
kernel: 6.0.12-300.fc37.x86_64
linkmode: dynamic
logDriver: journald
memFree: 11440615424
memTotal: 16533999616
networkBackend: netavark
ociRuntime:
name: crun
package: crun-1.7.2-1.fc37.x86_64
path: /usr/bin/crun
version: |-
crun version 1.7.2
commit: 0356bf4aff9a133d655dc13b1d9ac9424706cac4
rundir: /run/user/1000/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
os: linux
remoteSocket:
exists: true
path: /run/user/1000/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: true
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.2.0-8.fc37.x86_64
version: |-
slirp4netns version 1.2.0
commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
libslirp: 4.7.0
SLIRP_CONFIG_VERSION_MAX: 4
libseccomp: 2.5.3
swapFree: 0
swapTotal: 0
uptime: 94h 16m 47.00s (Approximately 3.92 days)
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
volume:
- local
registries:
search:
- registry.fedoraproject.org
- registry.access.redhat.com
- docker.io
- quay.io
store:
configFile: /var/home/martin/.config/containers/storage.conf
containerStore:
number: 2
paused: 0
running: 2
stopped: 0
graphDriverName: overlay
graphOptions: {}
graphRoot: /home/martin/.local/share/containers/storage
graphRootAllocated: 228501188608
graphRootUsed: 174334164992
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "true"
Supports d_type: "true"
Using metacopy: "false"
imageCopyTmpDir: /var/tmp
imageStore:
number: 2
runRoot: /run/user/1000/containers
volumePath: /home/martin/.local/share/containers/storage/volumes
version:
APIVersion: 4.3.1
Built: 1668178887
BuiltTime: Fri Nov 11 16:01:27 2022
GitCommit: ""
GoVersion: go1.19.2
Os: linux
OsArch: linux/amd64
Version: 4.3.1
Package info (e.g. output of rpm -q podman or apt list podman or brew info podman):
podman-4.3.1-1.fc37.x86_64
Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
Yes -- it's the latest version. The troubleshooting guide even recommends --userns=keep-id for some use cases, but that's what is broken.
Additional environment details (AWS, VirtualBox, physical, etc.): physical and QEMU.
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
I am trying to build a bigger container image that contains all my usual desktop/applications, and use it with toolbox. With a "fairly large" image,
podman createalready takes some 30s, and once I add TeXlive to the image, it never finishes any more and eventually kills the machine.I stripped off the numerous toolbox options/layers and reduced that to a podman command. The crucial option is
--userns=keep-id, which sets off somestorage-chown-by-mapsprocess.Steps to reproduce the issue:
Describe the results you received:
Step 1 takes 4 s on a Fedora 37 cloud VM (2 CPUs, 4 GiB RAM) with the default btrfs. On a standard RHEL 9.2 VM with XFS and on my laptop's Fedora 37 VM with /home being on ext4, it takes about 20 seconds. In
topI see a process called "exe" which is taking 100% CPU:That is really this:
With the larger image in step 2, the Fedora 37 btrfs VM takes merely 6s. However, both on the RHEL 9.2 XFS VM as well as my ext4 real-iron Fedora 37 laptop, the
storage-chown-by-mapsprocess never ends. After maybe half a minute it kills the VM (ssh dead, cannot log into the virsh console either), and my laptop becomes really sluggish, I cannot even starttopany more. Trying tokill -9or evensudo kill -9(!) thatstorage-chown-by-mapsdoes not work either, it's just unkillable.Describe the results you expected:
The
storage-chown-by-mapsprocess should finish eventually, but ideally reasonably fast. This is more or less a glorifiedchown -R, no? that shouldn't take more than a few seconds.Additional information you deem important (e.g. issue happens only occasionally): 100% reproducible, also in a synthetic cloud instance.
Output of
podman version:From Fedora 37:
current RHEL 9.2 also has podman 4.3.
Output of
podman info:Package info (e.g. output of
rpm -q podmanorapt list podmanorbrew info podman):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?
Yes -- it's the latest version. The troubleshooting guide even recommends
--userns=keep-idfor some use cases, but that's what is broken.Additional environment details (AWS, VirtualBox, physical, etc.): physical and QEMU.