Skip to content

Driver "zfs" failed to remove root filesystem: "cannot open '<dataset-path>': dataset does not exist" #43080

@Qubad786

Description

@Qubad786

Description
we are using openebs/zfs and we have multiple reports where users are having issues that boils down to the docker with zfs storage driver.

Pods are stuck in terminating state:

container <ID> driver "zfs" failed to remove root filesystem: exit status 1: "zfs fs destroy -r <dataset-path>" => cannot open '<dataset-path>': dataset does not exist"

Reports:

This seems to have a workaround where you manually create missing datasets to make docker happy and things come back to normal.. which is tedious.

Steps to reproduce the issue:
Unable to find concrete steps to reproduce, it seems to be a raise condition or non-atomic operation where driver cloudn't finish the removal of the layers datasets but gets into crashloop and most of times its on system restart / power cut when pods are stuck in terminating state giving the following logs:

Describe the results you received:

truenas# docker ps -a
CONTAINER ID   IMAGE                         COMMAND                  CREATED       STATUS                     PORTS     NAMES
d39f9d831680   openebs/zfs-driver            "/usr/local/bin/zfs-…"   4 weeks ago   Removal In Progress                  k8s_openebs-zfs-plugin_openebs-zfs-node-ds5sd_kube-system_d692ac39-ac25-4da6-adbb-93615d9cef3b_203
4f7053116229   rancher/pause:3.1             "/pause"                 4 weeks ago   Exited (255) 4 weeks ago             k8s_POD_openebs-zfs-node-ds5sd_kube-system_d692ac39-ac25-4da6-adbb-93615d9cef3b_27


truenas# k get pods -A
NAMESPACE          NAME                                    READY   STATUS        RESTARTS   AGE
kube-system        openebs-zfs-node-ds5sd                  0/2     Terminating   315        106d
kube-system        coredns-7448499f4d-hmsv9                1/1     Running       0          8h
kube-system        openebs-zfs-controller-0                5/5     Running       0          8h

#######
## LOGS:
#######
Oct 27 01:36:10 truenas k3s[8629]: I1027 01:36:10.703743    8629 scope.go:111] "RemoveContainer" containerID="d39f9d83168090d640c8473d2311ec52be722e58951549aef32f187e59f90f8b"
Oct 27 01:36:10 truenas dockerd[8071]: time="2021-10-27T01:36:10.739172443+02:00" level=error msg="Error removing mounted layer d39f9d83168090d640c8473d2311ec52be722e58951549aef32f187e59f90f8b: exit status 1: \"/usr/sbin/zfs fs destroy -r storage_404/ix-applications/docker/3114cdddca0eb01f1de848925eba4d027796b424b91f12d843673e174667bd24\" => cannot open 'storage_404/ix-applications/docker/3114cdddca0eb01f1de848925eba4d027796b424b91f12d843673e174667bd24': dataset does not exist\n"
Oct 27 01:36:10 truenas dockerd[8071]: time="2021-10-27T01:36:10.739870694+02:00" level=error msg="Handler for DELETE /v1.41/containers/d39f9d83168090d640c8473d2311ec52be722e58951549aef32f187e59f90f8b returned error: container d39f9d83168090d640c8473d2311ec52be722e58951549aef32f187e59f90f8b: driver \"zfs\" failed to remove root filesystem: exit status 1: \"/usr/sbin/zfs fs destroy -r storage_404/ix-applications/docker/3114cdddca0eb01f1de848925eba4d027796b424b91f12d843673e174667bd24\" => cannot open 'storage_404/ix-applications/docker/3114cdddca0eb01f1de848925eba4d027796b424b91f12d843673e174667bd24': dataset does not exist\n"
Oct 27 01:36:10 truenas k3s[8629]: E1027 01:36:10.740889    8629 remote_runtime.go:296] "RemoveContainer from runtime service failed" err="rpc error: code = Unknown desc = failed to remove container \"d39f9d83168090d640c8473d2311ec52be722e58951549aef32f187e59f90f8b\": Error response from daemon: container d39f9d83168090d640c8473d2311ec52be722e58951549aef32f187e59f90f8b: driver \"zfs\" failed to remove root filesystem: exit status 1: \"/usr/sbin/zfs fs destroy -r storage_404/ix-applications/docker/3114cdddca0eb01f1de848925eba4d027796b424b91f12d843673e174667bd24\" => cannot open 'storage_404/ix-applications/docker/3114cdddca0eb01f1de848925eba4d027796b424b91f12d843673e174667bd24': dataset does not exist" containerID="d39f9d83168090d640c8473d2311ec52be722e58951549aef32f187e59f90f8b"
Oct 27 01:36:10 truenas k3s[8629]: E1027 01:36:10.741271    8629 kuberuntime_gc.go:146] "Failed to remove container" err="rpc error: code = Unknown desc = failed to remove container \"d39f9d83168090d640c8473d2311ec52be722e58951549aef32f187e59f90f8b\": Error response from daemon: container d39f9d83168090d640c8473d2311ec52be722e58951549aef32f187e59f90f8b: driver \"zfs\" failed to remove root filesystem: exit status 1: \"/usr/sbin/zfs fs destroy -r storage_404/ix-applications/docker/3114cdddca0eb01f1de848925eba4d027796b424b91f12d843673e174667bd24\" => cannot open 'storage_404/ix-applications/docker/3114cdddca0eb01f1de848925eba4d027796b424b91f12d843673e174667bd24': dataset does not exist" containerID="d39f9d83168090d640c8473d2311ec52be722e58951549aef32f187e59f90f8b"

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client: Docker Engine - Community
 Version:           20.10.11
 API version:       1.41
 Go version:        go1.16.9
 Git commit:        dea9396
 Built:             Thu Nov 18 00:37:22 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.11
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.9
  Git commit:       847da18
  Built:            Thu Nov 18 00:35:31 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.12
  GitCommit:        7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.6.3-docker)
  scan: Docker Scan (Docker Inc., v0.9.0)

Server:
 Containers: 36
  Running: 33
  Paused: 0
  Stopped: 3
 Images: 23
 Server Version: 20.10.11
 Storage Driver: zfs
  Zpool: evo
  Zpool Health: ONLINE
  Parent Dataset: evo/ix-applications/docker
  Space Used By Parent: 2543288320
  Space Available: 16198967296
  Parent Quota: no
  Compression: lz4
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc version: v1.0.2-0-g52b36a2
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.10.70+truenas
 Operating System: Debian GNU/Linux 11 (bullseye)
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 13.53GiB
 Name: truenas.local
 ID: NDIM:5V6V:MQ5E:QNFP:VPAB:4EYO:YKNL:EF55:WW5R:2IQM:ARAF:5NIG
 Docker Root Dir: /mnt/evo/ix-applications/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions