Devmapper: consistency issue bugfixes by renzhengeek · Pull Request #3436 · containerd/containerd

renzhengeek · 2019-07-21T16:20:31Z

Hi,

These are several patches for problems we've seen in our testing.

In extreme test cases, like hard reboot and force unplug disk while k8s is actively working, we've seen 2 kind of errors as below because of inconsistency occurring between devmapper database and thin-pool, which cause errors to create snapshot/pod afterward:

"file exists" error:

failed to create containerd container: failed to create snapshot
\"vg0-mythinpool-snap-1083\" (dev: 59) from \"vg0-mythinpool-snap-19\"
(dev: 6): 1 error occurred:\n\n* file exists"

"object already exists"

time="2019-07-19T15:40:02.331427856+08:00" level=error msg="failed to
create snapshot device from parent vg0-mythinpool-snap-1" error="failed
to save initial metadata for snapshot "vg0-mythinpool-snap-4": object
already exists"
time="2019-07-19T15:40:02.331436436+08:00" level=debug msg="snapshotter
error" error="1 error occurred:

* failed to save initial metadata for snapshot "vg0-mythinpool-snap-4":
object already exists"

Please help review and give feedback to improve the fixes, thanks!

renzhengeek · 2019-07-21T16:23:06Z

@mxpv @fuweid @Ace-Tang

theopenlab-ci · 2019-07-21T16:29:59Z

Build succeeded.

containerd-build-arm64 : SUCCESS in 8m 21s (non-voting)

theopenlab-ci · 2019-07-22T01:45:21Z

Build succeeded.

containerd-build-arm64 : SUCCESS in 8m 18s (non-voting)

theopenlab-ci · 2019-07-22T05:19:50Z

Build succeeded.

containerd-build-arm64 : SUCCESS in 8m 19s (non-voting)

On extreme cases, like crash or underlaying disk disappear, this code path could be taken in deleting snapshot: 1. failed to activate thin device, then rollback follows; 2. failed to delete thin device; 3. succeed to remove device info in metadata. After recovery, it will fail to work: """ failed to create containerd container: failed to create snapshot \"vg0-mythinpool-snap-1083\" (dev: 59) from \"vg0-mythinpool-snap-19\" (dev: 6): 1 error occurred:\n\n* file exists" """ because dev:59 is free in database, but exists in kernel. Signed-off-by: Eric Ren <[email protected]>

theopenlab-ci · 2019-07-22T06:30:42Z

Build succeeded.

containerd-build-arm64 : SUCCESS in 8m 24s (non-voting)

theopenlab-ci · 2019-07-22T08:12:49Z

Build succeeded.

containerd-build-arm64 : SUCCESS in 8m 20s (non-voting)

mxpv · 2019-07-23T17:48:11Z

Can you elaborate a bit on the story behind this patch. In which cases AddDevice can be called with duplicated name? Why would we want to maintain orphans table instead of returning ErrAlreadyExists?

renzhengeek · 2019-07-24T01:32:18Z

Hi,

Can you elaborate a bit on the story behind this patch. In which cases AddDevice can be called with duplicated name? Why would we want to maintain orphans table instead of returning ErrAlreadyExists?

As said in commit log, the result is blow, but what is missing in commit log is a reproducer, will represent one blow:

“”“
All cases that crash/panic happens in between steps
to perform creating/deleting snapshot will cause
the error below after containerd starts again:

"""
time="2019-07-19T15:40:02.331427856+08:00" level=error msg="failed to
create snapshot device from parent vg0-mythinpool-snap-1" error="failed
to save initial metadata for snapshot "vg0-mythinpool-snap-4": object
already exists"
time="2019-07-19T15:40:02.331436436+08:00" level=debug msg="snapshotter
error" error="1 error occurred:

failed to save initial metadata for snapshot "vg0-mythinpool-snap-4":
object already exists"
"""
”“”

"snap-id" is an auto-increase from upper snapshotter code. devmapper will fail to create snapshot anymore after there has been a broken snapshot named with a "snap-id" which was reported to upper code not being taken.

So, containerd will keep trying to create snapshot with this "snap-id", but only failed with "object already exists" error.

one fake reproducer by fault injection in code

fault code patch

diff --git a/snapshots/devmapper/pool_device.go b/snapshots/devmapper/pool_device.go
index 680d8b1..7db66d8 100644
--- a/snapshots/devmapper/pool_device.go
+++ b/snapshots/devmapper/pool_device.go
@@ -22,6 +22,7 @@ import (
"context"
"path/filepath"
"strconv"
+ "fmt"

"github.com/hashicorp/go-multierror"
"github.com/pkg/errors"
@@ -90,6 +91,10 @@ func (p *PoolDevice) transition(ctx context.Context, deviceName string, tryingSt
result = multierror.Append(result, err)
}

+ //log.G(ctx).Fatalf("Inject fault: dev: %s, state: %v --> %v", deviceName, tryingState, successState)
+ msg := fmt.Sprintf("Inject fault: dev: %s, state: %v --> %v", deviceName, tryingState, successState)
+ panic(msg)
+
// If operation succeeded transition to success state, otherwise save error details
uerr = p.metadata.UpdateDevice(ctx, deviceName, func(deviceInfo *DeviceInfo) error {
if err == nil {

revert the fault patch, and restart containerd again, you will see the problem :-)

I know that my proposed fix is not very good:-/
pro: those problematic devices in orphan table is less enough, giving a chance for admin or a helper tool to read;
con: introducing orphan bucket is a little invasive and dirty...

Another idea occurs to my mind is, since the issue only happens when extreme cases, like machine crash and disk disappear, after which a container restart is likely required, so can we remove the problematic device info in place when reloading devmapper plugin?

Thank you for looking at this issue!

jiazhiguang · 2019-07-24T03:16:56Z

I also encountered the same problem.
A disk was unplugged when trying to create a thin-device, and the thin-device create failed with the following error:

operation not supported

After repairing the disk and restarting the machine, the creation of the device still fails with the following error:

failed to create snapshot "vg0-1000" (dev: 178) from "vg0-33" (dev: 23): 1 error occurred:

* file exists

The problem should occur in the CreateThinDevice method of pool_device.go:

After unplugging the disk, it failed when trying to activate the device.
Execute two defer methods: p.deleteDevice to delete the device and p.metadata.RemoveDevice to delete the metadata respectively.
p.metadata.RemoveDevice delete the device's metadata successfully, p.deleteDevice delete device failed as the disk has been unplugged and the device still on the disk.
After restarting the machine, the status of the device is free. When trying to create the device again, it failed and occure the file exists error.

renzhengeek · 2019-07-24T03:37:47Z

Hi,

The problem should occur in the CreateThinDevice method of pool_device.go:

After unplugging the disk, it failed when trying to activate the device.

Execute two defer methods: p.deleteDevice to delete the device and p.metadata.RemoveDevice to delete the metadata respectively.

p.metadata.RemoveDevice delete the device's metadata successfully, p.deleteDevice delete device failed as the disk has been unplugged and the device still on the disk.

After restarting the machine, the status of the device is free. When trying to create the device again, it failed and occure the file exists error.

The first patch is exactly trying to fix this problem. Thanks for confirming!

dmsetup remove with "--deferred" option may remove device in kernel latter than the time when its device info has been removed in database, causing the "file exists" error as previous commit. This error really happens in our extreme testing: power off server and disk disappear while k8s is creating/deleting pods. The fix reveals a testcase bug in "TestSnapshotterSuite/RemoveIntermediateSnapshot", for block device snapshot, we should umount the snapshot before removing it. Signed-off-by: Eric Ren <[email protected]>

All cases that crash/panic happens in between steps to perform creating/deleting snapshot will cause the error below after containerd starts again: """ time="2019-07-19T15:40:02.331427856+08:00" level=error msg="failed to create snapshot device from parent vg0-mythinpool-snap-1" error="failed to save initial metadata for snapshot "vg0-mythinpool-snap-4": object already exists" time="2019-07-19T15:40:02.331436436+08:00" level=debug msg="snapshotter error" error="1 error occurred: * failed to save initial metadata for snapshot "vg0-mythinpool-snap-4": object already exists" """ My thoughts were to move the aborted-state device into an orphan bucket, and remove the orphan devices when devmapper plugin is reloaded, making it configurable is a plus. However, removing an orphan device is not that easy without invasive change. This is corner case, so the devices in orphan buket is less enough for people to look and take actions such as: 1. stop containerd; 2. delete thin device if exists; 3. remove orphan devie info. Signed-off-by: Eric Ren <[email protected]>

theopenlab-ci · 2019-07-24T11:34:30Z

Build succeeded.

containerd-build-arm64 : SUCCESS in 8m 20s (non-voting)

codecov-io · 2019-07-24T12:07:54Z

Codecov Report

Merging #3436 into master will decrease coverage by 0.02%.
The diff coverage is 42.85%.

@@            Coverage Diff             @@
##           master    #3436      +/-   ##
==========================================
- Coverage   44.11%   44.08%   -0.03%     
==========================================
  Files         124      124              
  Lines       13760    13768       +8     
==========================================
  Hits         6070     6070              
- Misses       6759     6764       +5     
- Partials      931      934       +3

Flag	Coverage Δ
#linux	`47.83% <42.85%> (-0.04%)`	⬇️
#windows	`39.76% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
snapshots/devmapper/pool_device.go	`59.77% <33.33%> (-1.64%)`	⬇️
snapshots/devmapper/metadata.go	`64.7% <50%> (-1.71%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f776141...76d612c. Read the comment docs.

mxpv · 2019-07-29T19:29:02Z

I have a few ideas I'd like to experiment with. I'll update here or will post an additional PR.

renzhengeek · 2019-07-31T03:43:15Z

I have a few ideas I'd like to experiment with. I'll update here or will post an additional PR.

Hi,
Thanks, I like your idea that adding a new state to mark problematic device instead of adding a new table though not go through the details yet.

However, one problem I can think up is, when it needs manual handling, how can I find the faulty devices to look at and repair?

We can develop handy tool to scan the database to find the "faulty" device, so please make it easy for such tool to distinguish "faulty" devices as possible :-)

mxpv · 2019-07-31T18:40:00Z

However, one problem I can think up is, when it needs manual handling, how can I find the faulty devices to look at and repair?

This should be straightforward. There is a table device_ids which is <devID>=<state> with ids ever used by devmapper snapshotter. There are 3 possible here:
0=free
1=taken
2=faulty

I think it's a good idea to develop a tool that would simplify devmapper database introspection and recovery.

renzhengeek · 2019-08-07T08:45:37Z

@mxpv has resolved these issue in #3470 #3489 , thank a lot!

renzhengeek force-pushed the devmapper/consistency-bugfixes branch from 79fb413 to 3e2f3bf Compare July 22, 2019 01:36

renzhengeek force-pushed the devmapper/consistency-bugfixes branch from 3e2f3bf to c21ab01 Compare July 22, 2019 05:10

renzhengeek force-pushed the devmapper/consistency-bugfixes branch from c21ab01 to cb7e03a Compare July 22, 2019 06:20

renzhengeek force-pushed the devmapper/consistency-bugfixes branch from cb7e03a to e27a829 Compare July 22, 2019 08:03

renzhengeek force-pushed the devmapper/consistency-bugfixes branch 2 times, most recently from 6be89b9 to 1dde9a3 Compare July 24, 2019 11:05

Eric Ren added 2 commits July 24, 2019 19:13

renzhengeek force-pushed the devmapper/consistency-bugfixes branch from 1dde9a3 to 76d612c Compare July 24, 2019 11:13

mxpv mentioned this pull request Jul 30, 2019

Better error recovery in device mapper #3470

Merged

mxpv mentioned this pull request Aug 5, 2019

Mark devices with invalid state as faulty #3489

Merged

renzhengeek closed this Aug 7, 2019

jeromegn mentioned this pull request Dec 2, 2020

devmapper gets in an unrecoverable state under certain conditions #4790

Closed

Conversation

renzhengeek commented Jul 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

renzhengeek commented Jul 21, 2019

Uh oh!

theopenlab-ci Bot commented Jul 21, 2019

Uh oh!

theopenlab-ci Bot commented Jul 22, 2019

Uh oh!

theopenlab-ci Bot commented Jul 22, 2019

Uh oh!

theopenlab-ci Bot commented Jul 22, 2019

Uh oh!

theopenlab-ci Bot commented Jul 22, 2019

Uh oh!

mxpv commented Jul 23, 2019

Uh oh!

renzhengeek commented Jul 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiazhiguang commented Jul 24, 2019

Uh oh!

renzhengeek commented Jul 24, 2019

Uh oh!

theopenlab-ci Bot commented Jul 24, 2019

Uh oh!

codecov-io commented Jul 24, 2019

Codecov Report

Uh oh!

mxpv commented Jul 29, 2019

Uh oh!

renzhengeek commented Jul 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mxpv commented Jul 31, 2019

Uh oh!

renzhengeek commented Aug 7, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

renzhengeek commented Jul 21, 2019 •

edited

Loading

renzhengeek commented Jul 24, 2019 •

edited

Loading

renzhengeek commented Jul 31, 2019 •

edited

Loading