Skip to content

Devmapper: consistency issue bugfixes#3436

Closed
renzhengeek wants to merge 3 commits intocontainerd:masterfrom
renzhengeek:devmapper/consistency-bugfixes
Closed

Devmapper: consistency issue bugfixes#3436
renzhengeek wants to merge 3 commits intocontainerd:masterfrom
renzhengeek:devmapper/consistency-bugfixes

Conversation

@renzhengeek
Copy link
Copy Markdown

@renzhengeek renzhengeek commented Jul 21, 2019

Hi,

These are several patches for problems we've seen in our testing.

In extreme test cases, like hard reboot and force unplug disk while k8s is actively working, we've seen 2 kind of errors as below because of inconsistency occurring between devmapper database and thin-pool, which cause errors to create snapshot/pod afterward:

  • "file exists" error:
failed to create containerd container: failed to create snapshot
\"vg0-mythinpool-snap-1083\" (dev: 59) from \"vg0-mythinpool-snap-19\"
(dev: 6): 1 error occurred:\n\n* file exists"
  • "object already exists"
time="2019-07-19T15:40:02.331427856+08:00" level=error msg="failed to
create snapshot device from parent vg0-mythinpool-snap-1" error="failed
to save initial metadata for snapshot "vg0-mythinpool-snap-4": object
already exists"
time="2019-07-19T15:40:02.331436436+08:00" level=debug msg="snapshotter
error" error="1 error occurred:

* failed to save initial metadata for snapshot "vg0-mythinpool-snap-4":
object already exists"

Please help review and give feedback to improve the fixes, thanks!

@renzhengeek
Copy link
Copy Markdown
Author

@mxpv @fuweid @Ace-Tang

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jul 21, 2019

Build succeeded.

@renzhengeek renzhengeek force-pushed the devmapper/consistency-bugfixes branch from 79fb413 to 3e2f3bf Compare July 22, 2019 01:36
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jul 22, 2019

Build succeeded.

@renzhengeek renzhengeek force-pushed the devmapper/consistency-bugfixes branch from 3e2f3bf to c21ab01 Compare July 22, 2019 05:10
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jul 22, 2019

Build succeeded.

On extreme cases, like crash or underlaying disk
disappear, this code path could be taken in deleting
snapshot:

1. failed to activate thin device, then rollback follows;
2. failed to delete thin device;
3. succeed to remove device info in metadata.

After recovery, it will fail to work:
"""
failed to create containerd container: failed to create snapshot
\"vg0-mythinpool-snap-1083\" (dev: 59) from \"vg0-mythinpool-snap-19\"
(dev: 6): 1 error occurred:\n\n* file exists"
"""

because dev:59 is free in database, but exists in kernel.

Signed-off-by: Eric Ren <[email protected]>
@renzhengeek renzhengeek force-pushed the devmapper/consistency-bugfixes branch from c21ab01 to cb7e03a Compare July 22, 2019 06:20
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jul 22, 2019

Build succeeded.

@renzhengeek renzhengeek force-pushed the devmapper/consistency-bugfixes branch from cb7e03a to e27a829 Compare July 22, 2019 08:03
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jul 22, 2019

Build succeeded.

@mxpv
Copy link
Copy Markdown
Member

mxpv commented Jul 23, 2019

Can you elaborate a bit on the story behind this patch. In which cases AddDevice can be called with duplicated name? Why would we want to maintain orphans table instead of returning ErrAlreadyExists?

@renzhengeek
Copy link
Copy Markdown
Author

renzhengeek commented Jul 24, 2019

Hi,

Can you elaborate a bit on the story behind this patch. In which cases AddDevice can be called with duplicated name? Why would we want to maintain orphans table instead of returning ErrAlreadyExists?

As said in commit log, the result is blow, but what is missing in commit log is a reproducer, will represent one blow:

“”“
All cases that crash/panic happens in between steps
to perform creating/deleting snapshot will cause
the error below after containerd starts again:

"""
time="2019-07-19T15:40:02.331427856+08:00" level=error msg="failed to
create snapshot device from parent vg0-mythinpool-snap-1" error="failed
to save initial metadata for snapshot "vg0-mythinpool-snap-4": object
already exists"
time="2019-07-19T15:40:02.331436436+08:00" level=debug msg="snapshotter
error" error="1 error occurred:

  • failed to save initial metadata for snapshot "vg0-mythinpool-snap-4":
    object already exists"
    """
    ”“”

"snap-id" is an auto-increase from upper snapshotter code. devmapper will fail to create snapshot anymore after there has been a broken snapshot named with a "snap-id" which was reported to upper code not being taken.

So, containerd will keep trying to create snapshot with this "snap-id", but only failed with "object already exists" error.

  • one fake reproducer by fault injection in code
  1. fault code patch
diff --git a/snapshots/devmapper/pool_device.go b/snapshots/devmapper/pool_device.go
index 680d8b1..7db66d8 100644
--- a/snapshots/devmapper/pool_device.go
+++ b/snapshots/devmapper/pool_device.go
@@ -22,6 +22,7 @@ import (
"context"
"path/filepath"
"strconv"
+ "fmt"

"github.com/hashicorp/go-multierror"
"github.com/pkg/errors"
@@ -90,6 +91,10 @@ func (p *PoolDevice) transition(ctx context.Context, deviceName string, tryingSt
result = multierror.Append(result, err)
}

+ //log.G(ctx).Fatalf("Inject fault: dev: %s, state: %v --> %v", deviceName, tryingState, successState)
+ msg := fmt.Sprintf("Inject fault: dev: %s, state: %v --> %v", deviceName, tryingState, successState)
+ panic(msg)
+
// If operation succeeded transition to success state, otherwise save error details
uerr = p.metadata.UpdateDevice(ctx, deviceName, func(deviceInfo *DeviceInfo) error {
if err == nil {
  1. revert the fault patch, and restart containerd again, you will see the problem :-)

I know that my proposed fix is not very good:-/
pro: those problematic devices in orphan table is less enough, giving a chance for admin or a helper tool to read;
con: introducing orphan bucket is a little invasive and dirty...

Another idea occurs to my mind is, since the issue only happens when extreme cases, like machine crash and disk disappear, after which a container restart is likely required, so can we remove the problematic device info in place when reloading devmapper plugin?

Thank you for looking at this issue!

@jiazhiguang
Copy link
Copy Markdown
Contributor

I also encountered the same problem.
A disk was unplugged when trying to create a thin-device, and the thin-device create failed with the following error:

operation not supported

After repairing the disk and restarting the machine, the creation of the device still fails with the following error:

failed to create snapshot "vg0-1000" (dev: 178) from "vg0-33" (dev: 23): 1 error occurred:

* file exists

The problem should occur in the CreateThinDevice method of pool_device.go:

  1. After unplugging the disk, it failed when trying to activate the device.
  2. Execute two defer methods: p.deleteDevice to delete the device and p.metadata.RemoveDevice to delete the metadata respectively.
  3. p.metadata.RemoveDevice delete the device's metadata successfully, p.deleteDevice delete device failed as the disk has been unplugged and the device still on the disk.
  4. After restarting the machine, the status of the device is free. When trying to create the device again, it failed and occure the file exists error.

@renzhengeek
Copy link
Copy Markdown
Author

Hi,

The problem should occur in the CreateThinDevice method of pool_device.go:

  1. After unplugging the disk, it failed when trying to activate the device.
  2. Execute two defer methods: p.deleteDevice to delete the device and p.metadata.RemoveDevice to delete the metadata respectively.
  3. p.metadata.RemoveDevice delete the device's metadata successfully, p.deleteDevice delete device failed as the disk has been unplugged and the device still on the disk.
  4. After restarting the machine, the status of the device is free. When trying to create the device again, it failed and occure the file exists error.

The first patch is exactly trying to fix this problem. Thanks for confirming!

@renzhengeek renzhengeek force-pushed the devmapper/consistency-bugfixes branch 2 times, most recently from 6be89b9 to 1dde9a3 Compare July 24, 2019 11:05
Eric Ren added 2 commits July 24, 2019 19:13
dmsetup remove with "--deferred" option may remove
device in kernel latter than the time when its device
info has been removed in database, causing the "file
exists" error as previous commit.

This error really happens in our extreme testing:
power off server and disk disappear while k8s is
creating/deleting pods.

The fix reveals a testcase bug in
"TestSnapshotterSuite/RemoveIntermediateSnapshot",
for block device snapshot, we should umount the
snapshot before removing it.

Signed-off-by: Eric Ren <[email protected]>
All cases that crash/panic happens in between steps
to perform creating/deleting snapshot will cause
the error below after containerd starts again:

"""
time="2019-07-19T15:40:02.331427856+08:00" level=error msg="failed to
create snapshot device from parent vg0-mythinpool-snap-1" error="failed
to save initial metadata for snapshot "vg0-mythinpool-snap-4": object
already exists"
time="2019-07-19T15:40:02.331436436+08:00" level=debug msg="snapshotter
error" error="1 error occurred:

* failed to save initial metadata for snapshot "vg0-mythinpool-snap-4":
object already exists"
"""

My thoughts were to move the aborted-state device into an orphan bucket,
and remove the orphan devices when devmapper plugin is reloaded, making
it configurable is a plus.

However, removing an orphan device is not that easy without invasive
change.

This is corner case, so the devices in orphan buket is less enough for
people to look and take actions such as:
1. stop containerd;
2. delete thin device if exists;
3. remove orphan devie info.

Signed-off-by: Eric Ren <[email protected]>
@renzhengeek renzhengeek force-pushed the devmapper/consistency-bugfixes branch from 1dde9a3 to 76d612c Compare July 24, 2019 11:13
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jul 24, 2019

Build succeeded.

@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #3436 into master will decrease coverage by 0.02%.
The diff coverage is 42.85%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3436      +/-   ##
==========================================
- Coverage   44.11%   44.08%   -0.03%     
==========================================
  Files         124      124              
  Lines       13760    13768       +8     
==========================================
  Hits         6070     6070              
- Misses       6759     6764       +5     
- Partials      931      934       +3
Flag Coverage Δ
#linux 47.83% <42.85%> (-0.04%) ⬇️
#windows 39.76% <ø> (ø) ⬆️
Impacted Files Coverage Δ
snapshots/devmapper/pool_device.go 59.77% <33.33%> (-1.64%) ⬇️
snapshots/devmapper/metadata.go 64.7% <50%> (-1.71%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f776141...76d612c. Read the comment docs.

@mxpv
Copy link
Copy Markdown
Member

mxpv commented Jul 29, 2019

I have a few ideas I'd like to experiment with. I'll update here or will post an additional PR.

@renzhengeek
Copy link
Copy Markdown
Author

renzhengeek commented Jul 31, 2019

I have a few ideas I'd like to experiment with. I'll update here or will post an additional PR.

Hi,
Thanks, I like your idea that adding a new state to mark problematic device instead of adding a new table though not go through the details yet.

However, one problem I can think up is, when it needs manual handling, how can I find the faulty devices to look at and repair?

We can develop handy tool to scan the database to find the "faulty" device, so please make it easy for such tool to distinguish "faulty" devices as possible :-)

@mxpv
Copy link
Copy Markdown
Member

mxpv commented Jul 31, 2019

However, one problem I can think up is, when it needs manual handling, how can I find the faulty devices to look at and repair?

This should be straightforward. There is a table device_ids which is <devID>=<state> with ids ever used by devmapper snapshotter. There are 3 possible here:
0=free
1=taken
2=faulty

I think it's a good idea to develop a tool that would simplify devmapper database introspection and recovery.

@renzhengeek
Copy link
Copy Markdown
Author

@mxpv has resolved these issue in #3470 #3489 , thank a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants