Skip to content

No option to disable freelist synchronization in order to recover corrupted bolt meta.db #4838

@jar349

Description

@jar349

After an unexpected power outage, my containerd service fails to start, thus:

Started Service for snap application microk8s.daemon-containerd.
+ '[' -d /sys/kernel/security/apparmor ']'
++ cat /proc/self/attr/current
+ '[' 'snap.microk8s.daemon-containerd (complain)' '!=' unconfined ']'
+ exec aa-exec -p unconfined -- /snap/microk8s/1864/run-containerd-with-args
+ '[' -d /sys/kernel/security/apparmor ']'
++ cat /proc/self/attr/current
+ '[' unconfined '!=' unconfined ']'
+ export PATH=/snap/microk8s/1864/usr/sbin:/snap/microk8s/1864/usr/bin:/snap/microk8s/1864/sbin:/snap/microk8s/1864
+ PATH=/snap/microk8s/1864/usr/sbin:/snap/microk8s/1864/usr/bin:/snap/microk8s/1864/sbin:/snap/microk8s/1864/bin:/u
++ /snap/microk8s/1864/bin/uname -m
+ ARCH=x86_64
+ export LD_LIBRARY_PATH=:/snap/microk8s/1864/lib:/snap/microk8s/1864/usr/lib:/snap/microk8s/1864/lib/x86_64-linux-
+ LD_LIBRARY_PATH=:/snap/microk8s/1864/lib:/snap/microk8s/1864/usr/lib:/snap/microk8s/1864/lib/x86_64-linux-gnu:/sn
+ export LD_LIBRARY_PATH=/var/lib/snapd/lib/gl:/var/lib/snapd/lib/gl32:/var/lib/snapd/void::/snap/microk8s/1864/lib
+ LD_LIBRARY_PATH=/var/lib/snapd/lib/gl:/var/lib/snapd/lib/gl32:/var/lib/snapd/void::/snap/microk8s/1864/lib:/snap/
+ export XDG_RUNTIME_DIR=/var/snap/microk8s/common/run
+ XDG_RUNTIME_DIR=/var/snap/microk8s/common/run
+ mkdir -p /var/snap/microk8s/common/run
+ source /snap/microk8s/1864/actions/common/utils.sh
+ '[' -d /etc/apparmor.d ']'
+ echo 'Using a default profile template'
Using a default profile template
+ cp /snap/microk8s/1864/containerd-profile /etc/apparmor.d/cri-containerd.apparmor.d
+ echo 'Reloading AppArmor profiles'
Reloading AppArmor profiles
+ service apparmor reload
+ app=containerd
+ '[' -e /var/snap/microk8s/1864/var/lock/gpu ']'
+ RUNTIME=runc
++ snapshotter
+++ stat -f -c %T /var/snap/microk8s/common
++ FSTYPE=ext2/ext3
++ '[' ext2/ext3 = zfs ']'
++ echo overlayfs
+ SNAPSHOTTER=overlayfs
+ sed 's@${SNAP}@/snap/microk8s/1864@g;s@${SNAP_DATA}@/var/snap/microk8s/1864@g;s@${SNAPSHOTTER}@overlayfs@g;s@${RU
++ is_service_expected_to_start flanneld
++ local service=flanneld
++ '[' -f /var/snap/microk8s/1864/var/lock/no-flanneld ']'
++ echo 1
+ run_flanneld=1
+ '[' 1 == 1 ']'
+ sed 's@${SNAP}@/snap/microk8s/1864@g;s@${SNAP_DATA}@/var/snap/microk8s/1864@g;s@${SNAP_COMMON}@/var/snap/microk8s
++ cat /var/snap/microk8s/1864/args/containerd
+ declare -a 'args=(--config ${SNAP_DATA}/args/containerd.toml
--root ${SNAP_COMMON}/var/lib/containerd
--state ${SNAP_COMMON}/run/containerd
--address ${SNAP_COMMON}/run/containerd.sock)'
+ set -a
+ . /var/snap/microk8s/1864/args/containerd-env
++ ulimit -n 65536
+ set +a
+ n=0
+ '[' 0 -ge 20 ']'
+ ip route
+ grep default
+ break
+ export CILIUM_SOCK=/var/snap/microk8s/1864/var/run/cilium/cilium.sock
+ CILIUM_SOCK=/var/snap/microk8s/1864/var/run/cilium/cilium.sock
+ exec /snap/microk8s/1864/bin/containerd --config /var/snap/microk8s/1864/args/containerd.toml --root /var/snap/mi
time="2020-12-13T15:23:51.894314928-05:00" level=info msg="starting containerd" revision=8fba4e9a7d01810a393d5d25a3
time="2020-12-13T15:23:51.941096210-05:00" level=info msg="loading plugin \"io.containerd.content.v1.content\"..."
time="2020-12-13T15:23:51.941202333-05:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.btrfs\"...
time="2020-12-13T15:23:51.941756325-05:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.btrfs
time="2020-12-13T15:23:51.941823204-05:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.devmapper\
time="2020-12-13T15:23:51.941865856-05:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.de
time="2020-12-13T15:23:51.941894042-05:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.aufs\"..."
time="2020-12-13T15:23:51.945160930-05:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.native\"..
time="2020-12-13T15:23:51.945222461-05:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.overlayfs\
time="2020-12-13T15:23:51.945379035-05:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.zfs\"..."
time="2020-12-13T15:23:51.945812181-05:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.zfs\"
time="2020-12-13T15:23:51.945857614-05:00" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." ty
time="2020-12-13T15:23:51.945895442-05:00" level=warning msg="could not use snapshotter devmapper in metadata plugi
time="2020-12-13T15:23:51.945920168-05:00" level=info msg="metadata content store policy set" policy=shared
panic: invalid freelist page: 1113, page type is leaf
goroutine 1 [running]:
github.com/containerd/containerd/vendor/go.etcd.io/bbolt.(*freelist).read(0xc000018480, 0x7fbacd454000)
        /build/microk8s/parts/containerd/go/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/freelist.g
github.com/containerd/containerd/vendor/go.etcd.io/bbolt.(*DB).loadFreelist.func1()
        /build/microk8s/parts/containerd/go/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/db.go:316
sync.(*Once).doSlow(0xc00018c568, 0xc0001d82c0)
        /snap/go/6745/src/sync/once.go:66 +0xee
sync.(*Once).Do(...)
        /snap/go/6745/src/sync/once.go:57
github.com/containerd/containerd/vendor/go.etcd.io/bbolt.(*DB).loadFreelist(0xc00018c400)
        /build/microk8s/parts/containerd/go/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/db.go:309
github.com/containerd/containerd/vendor/go.etcd.io/bbolt.Open(0xc000023980, 0x53, 0x55cf000001a4, 0x55cf66c03b80, 0
        /build/microk8s/parts/containerd/go/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/db.go:286
github.com/containerd/containerd/services/server.LoadPlugins.func2(0xc000018400, 0x0, 0x0, 0x0, 0x0)
        /build/microk8s/parts/containerd/go/src/github.com/containerd/containerd/services/server/server.go:379 +0x8
github.com/containerd/containerd/plugin.(*Registration).Init(0xc000688120, 0xc000018400, 0x55cf660cf8e0)
        /build/microk8s/parts/containerd/go/src/github.com/containerd/containerd/plugin/plugin.go:110 +0x3a
github.com/containerd/containerd/services/server.New(0x55cf66310d20, 0xc000122000, 0xc000572300, 0x1, 0x1, 0xc00068
        /build/microk8s/parts/containerd/go/src/github.com/containerd/containerd/services/server/server.go:167 +0xd
github.com/containerd/containerd/cmd/containerd/command.App.func1(0xc00002a000, 0x55cf66c03ac0, 0xc00027c2e0)
        /build/microk8s/parts/containerd/go/src/github.com/containerd/containerd/cmd/containerd/command/main.go:177
github.com/containerd/containerd/vendor/github.com/urfave/cli.HandleAction(0x55cf6609df40, 0x55cf662b55b0, 0xc00002
        /build/microk8s/parts/containerd/go/src/github.com/containerd/containerd/vendor/github.com/urfave/cli/app.g
github.com/containerd/containerd/vendor/github.com/urfave/cli.(*App).Run(0xc000028000, 0xc00012a000, 0x9, 0x9, 0x0,
        /build/microk8s/parts/containerd/go/src/github.com/containerd/containerd/vendor/github.com/urfave/cli/app.g
main.main()
        github.com/containerd/containerd/cmd/containerd/main.go:33 +0x51
snap.microk8s.daemon-containerd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
snap.microk8s.daemon-containerd.service: Failed with result 'exit-code'.

Based on the commentary here I believe that by opening the bolt db with NoFreelistSync: true, the database will be recovered via a full re-sync, which should overcome the failure to recover shown above. However, there is currently no configuration option of the bolt plugin to set that value when the containerd server opens the database.

In summary, I believe that there is a way to recover containerd's bolt-based metadata db, but containerd doesn't offer it. Arguably, I consider that a bug. Perhaps you will consider it a feature.

Steps to reproduce the issue:

  1. Have an unexpected power outage that leads to corruption of containerd's bolt-based metadata DB
  2. Try to start the containerd systemd service
  3. containerd crashes with the log shown above

Describe the results you received:
The bolt-based metadata DB does not recover

Describe the results you expected:
The bolt-based metadata DB should be recoverable

Output of containerd --version:

containerd github.com/containerd/containerd v1.3.7 8fba4e9a7d01810a393d5d25a3621dc101981175

This instance of containerd is actually a part of microk8s. For example, the service name is snap.microk8s.daemon-containerd.service

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions