snapshots/devmapper: deactivate thin device after committed by renzhengeek · Pull Request #3262 · containerd/containerd

renzhengeek · 2019-05-07T15:00:33Z

reason to deactivate committed snapshot

The thin device will not be used for IO after committed,
and further thin snapshotting is OK using an inactive thin
device as origin. The benefits to deactivate are:

device is not unneccesary visible avoiding any unexpected IO;
save useless kernel data structs for maintaining active dm.

Quote from kernel doc (Documentation/device-mapper/provisioning.txt):

"
ii) Using an internal snapshot.

Once created, the user doesn't have to worry about any connection
between the origin and the snapshot. Indeed the snapshot is no
different from any other thinly-provisioned device and can be
snapshotted itself via the same method. It's perfectly legal to
have only one of them active, and there's no ordering requirement on
activating or removing them both. (This differs from conventional
device-mapper snapshots.)
"

an thinpool metadata bug is naturally removed

An problem happens when failed to suspend/resume origin thin device
when creating snapshot:

"failed to create snapshot device from parent vg0-mythinpool-snap-3"
error="failed to save initial metadata for snapshot "vg0-mythinpool-snap-19":
object already exists"

This issue occurs because when failed to create snapshot, the
snapshotter.store can be rollbacked, but the thin pool metadata
boltdb failed to rollback in PoolDevice.CreateSnapshotDevice(),
therefore metadata becomes inconsistent: the snapshotID is not
taken in snapshotter.store, but saved in pool metadata boltdb.

The cause is, in PoolDevice.CreateSnapshotDevice(), the defer calls
are invoked on "first-in-last-out" order. When the error happens
on the "resume device" defer call, the metadata is saved and
snapshot is created, which has no chance to be rollbacked.

Signed-off-by: Eric Ren [email protected]

renzhengeek · 2019-05-07T15:09:36Z

The following error:

"failed to create snapshot device from parent vg0-mythinpool-snap-3"
error="failed to save initial metadata for snapshot "vg0-mythinpool-snap-19":
object already exists"

can be reproduced by fault injection in code like this:

in snapshots/devmapper/pool_device.go::CreateSnapshotDevice()

                                                                
               // Resume back base thin device on exit
               defer func() {                    
                       //retErr = multierror.Append(retErr, p.resumeDevice(ctx, baseInfo)).ErrorOrNil()
                       log.G(context.TODO()).Debugf("In Resume back base thin device")
                       retErr = multierror.Append(retErr, errors.New("fake resumeDevice failure!")).ErrorOrNil()
               }()

renzhengeek · 2019-05-07T15:59:48Z

@mxpv Hi, please help review, any comments will be appreciated :)

renzhengeek · 2019-05-07T16:06:31Z

@mxpv Hi, if you don't like the idea to deactivate committed snapshot, another idea I can think up to solve the rollback issue is like this - use one defer call, put the dev resume action first:

// CreateSnapshotDevice creates and activates new thin-device from parent thin-device (makes snapshot)
func (p *PoolDevice) CreateSnapshotDevice(ctx context.Context, deviceName string, snapshotName string, virtualSizeBytes uint64) (retErr error) {
        var (
                devSuspended = false
                metaSaved = false
                snapCreated = false
        )   
...
       defer func() {
                // Resume back base thin device on exit
                if devSuspended {
                        retErr = multierror.Append(retErr, p.resumeDevice(ctx, baseInfo))
                }   

                // Rollback snapshot creation
                if snapCreated && retErr != nil {
                        retErr = multierror.Append(retErr, p.deleteDevice(ctx, snapInfo))
                }   

                // Rollback metadata
                if metaSaved && retErr != nil {
                        retErr = multierror.Append(retErr, p.metadata.RemoveDevice(ctx, snapInfo.Name))
                }   
        }() 
...

mxpv · 2019-05-07T21:18:42Z

In general I like the idea to deactivate a device after Commit as no IO expected anymore and it simplifies logic (so we don't have to submit extra calls to activate/deactivate/check active to devmapper).

Some snapshotter tests are failing now, can you have a look on those?

renzhengeek · 2019-05-08T01:45:05Z

In general I like the idea to deactivate a device after Commit as no IO expected anymore and it simplifies logic (so we don't have to submit extra calls to activate/deactivate/check active to devmapper).

Thanks.

Some snapshotter tests are failing now, can you have a look on those?

Sure:)

renzhengeek · 2019-05-08T03:51:46Z

Hmm strange, the CI failed at this case - FAIL: TestSnapshotterSuite/128LayersMount,
while my local testing can pass, I tried 10 times, no one failed.
go test -test.root -run TestSnapshotterSuite/128LayersMount

renzhengeek · 2019-05-08T07:29:45Z

Hmm, the AppVeyor failure seems unrelated to this pr:

[00:07:51]     --- FAIL: TestContentClient/CommitErrorState (4.00s)
[00:07:51]         testsuite.go:117: Cleanup failed: unknown
[00:07:51]             github.com/containerd/containerd/errdefs.init.ializers
[00:07:51]             	C:/gopath/src/github.com/containerd/containerd/errdefs/errors.go:39
[00:07:51]             runtime.main
[00:07:51]             	C:/go/src/runtime/proc.go:188
[00:07:51]             runtime.goexit
[00:07:51]             	C:/go/src/runtime/asm_amd64.s:1337
[00:07:51]             remove C:\Program Files\containerd\root-test\io.containerd.content.v1.content\ingest\9101c5f249b4cb38a50dddda86a787c09f31099573c87c525fc50774ac21d05d\.tmp-updatedat747542151: The process cannot access the file because it is being used by another process.

fuweid · 2019-05-08T08:13:08Z

@renzhengeek the AppVeyor failed case is flaky one. you can repush to make it happy...

renzhengeek · 2019-05-08T14:24:50Z

Hmm strange, the CI failed at this case - FAIL: TestSnapshotterSuite/128LayersMount,
while my local testing can pass, I tried 10 times, no one failed.
go test -test.root -run TestSnapshotterSuite/128LayersMount

The failure shows that the new snapshot doesn't capture all the data, which seems caused by the umount flags:

UMOUNT(2)

MNT_DETACH (since Linux 2.4.11)
              Perform a lazy unmount: make the mount point unavailable for
              new accesses, immediately disconnect the filesystem and all
              filesystems mounted below it from each other and from the
              mount table, and actually perform the unmount when the mount
              point ceases to be busy.

I am thinking if there is a proper way to flush the data anyway before deactivating the snapshot.

mxpv · 2019-05-08T19:06:22Z

I ran tests on Ubuntu 18. Everything works as expected except TransitivityTest, which is failed, because parseDmsetupError couldn't recognize ENXIO error. Something like this fixed the problem:

func parseDmsetupError(output string) string {
 ...
	line := lines[0]

	// Handle output like "Device /dev/mapper/snapshotter-suite-pool-snap-1 not found"
	if strings.HasSuffix(line, "not found") {
		return unix.ENXIO.Error()
	}

	const failedSubstr = "failed: "

As for Travis, can you try it without MNT_DETACH ?

renzhengeek · 2019-05-09T01:42:29Z

Hi,

I ran tests on Ubuntu 18. Everything works as expected except TransitivityTest, which is failed, because parseDmsetupError couldn't recognize ENXIO error. Something like this fixed the problem:
func parseDmsetupError(output string) string {
 ...
	line := lines[0]

	// Handle output like "Device /dev/mapper/snapshotter-suite-pool-snap-1 not found"
	if strings.HasSuffix(line, "not found") {
		return unix.ENXIO.Error()
	}

	const failedSubstr = "failed: "

Done and pushed, thanks!

As for Travis, can you try it without MNT_DETACH ?

I'm afraid changing testutil.Unmount() may bring effects on other places, instead I add SuspendDevice before removing in Commit() to ensure the IO laid on disk. I think Commit() is a much less frequent operation than Prepare()/View() (creating snapshot), adding a suspend operation is acceptable:)

Travis can pass now. Will trigger more times to see if it's stable enough.

mxpv

instead I add SuspendDevice before removing in Commit() to ensure the IO laid on disk

Yep, suspend will do the trick too, great idea.
LGTM

1. reason to deactivate committed snapshot The thin device will not be used for IO after committed, and further thin snapshotting is OK using an inactive thin device as origin. The benefits to deactivate are: - device is not unneccesary visible avoiding any unexpected IO; - save useless kernel data structs for maintaining active dm. Quote from kernel doc (Documentation/device-mapper/provisioning.txt): " ii) Using an internal snapshot. Once created, the user doesn't have to worry about any connection between the origin and the snapshot. Indeed the snapshot is no different from any other thinly-provisioned device and can be snapshotted itself via the same method. It's perfectly legal to have only one of them active, and there's no ordering requirement on activating or removing them both. (This differs from conventional device-mapper snapshots.) " 2. an thinpool metadata bug is naturally removed An problem happens when failed to suspend/resume origin thin device when creating snapshot: "failed to create snapshot device from parent vg0-mythinpool-snap-3" error="failed to save initial metadata for snapshot "vg0-mythinpool-snap-19": object already exists" This issue occurs because when failed to create snapshot, the snapshotter.store can be rollbacked, but the thin pool metadata boltdb failed to rollback in PoolDevice.CreateSnapshotDevice(), therefore metadata becomes inconsistent: the snapshotID is not taken in snapshotter.store, but saved in pool metadata boltdb. The cause is, in PoolDevice.CreateSnapshotDevice(), the defer calls are invoked on "first-in-last-out" order. When the error happens on the "resume device" defer call, the metadata is saved and snapshot is created, which has no chance to be rollbacked. Signed-off-by: Eric Ren <[email protected]>

renzhengeek · 2019-05-09T03:32:19Z

Travis failed with one failure not related to this pr:

Summarizing 1 Failure:
[Fail] [k8s.io] Container runtime should support basic operations on container [BeforeEach] runtime should support removing container [Conformance] 
/home/travis/gopath/src/github.com/kubernetes-sigs/cri-tools/pkg/framework/util.go:213
Ran 66 of 73 Specs in 55.701 seconds
FAIL! -- 65 Passed | 1 Failed | 0 Pending | 7 Skipped
Ginkgo ran 1 suite in 55.814877871s
Test Suite Failed
--- FAIL: TestCRISuite (55.82s)
    cri_test.go:121: Failed to run tests in paralllel: exit status 1
FAIL

Let's trigger CI again.

codecov-io · 2019-05-09T03:51:41Z

Codecov Report

Merging #3262 into master will decrease coverage by 0.02%.
The diff coverage is 55.55%.

@@            Coverage Diff             @@
##           master    #3262      +/-   ##
==========================================
- Coverage   44.56%   44.53%   -0.03%     
==========================================
  Files         113      113              
  Lines       12194    12195       +1     
==========================================
- Hits         5434     5431       -3     
- Misses       5928     5930       +2     
- Partials      832      834       +2

Flag	Coverage Δ
#linux	`48.5% <55.55%> (-0.04%)`	⬇️
#windows	`39.8% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
snapshots/devmapper/dmsetup/dmsetup.go	`80.23% <20%> (-1.97%)`	⬇️
snapshots/devmapper/snapshotter.go	`66.39% <33.33%> (-0.96%)`	⬇️
snapshots/devmapper/pool_device.go	`61.4% <75%> (+0.5%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d68b593...6f463d3. Read the comment docs.

For block device like devicemapper, umount before committing active snapshot can sync data onto disk. Otherewise: 1. deactivating a block device in use will fail or IO error when forced; 2. new snapshot on it will not catch the data not laid on disk yet. Signed-off-by: Eric Ren <[email protected]>

fuweid

LGTM

this is good improvement. when we run several instances from same image, it can save time.

crosbymichael · 2019-05-09T14:29:43Z

LGTM

renzhengeek force-pushed the renzhen/devmapper-bugfix branch from 68a9760 to 8045d2b Compare May 8, 2019 07:05

renzhengeek force-pushed the renzhen/devmapper-bugfix branch 3 times, most recently from af4daee to f87245b Compare May 8, 2019 11:26

renzhengeek force-pushed the renzhen/devmapper-bugfix branch 3 times, most recently from f36ca76 to e637578 Compare May 9, 2019 01:21

renzhengeek force-pushed the renzhen/devmapper-bugfix branch 2 times, most recently from 971d292 to 70e0c16 Compare May 9, 2019 02:12

mxpv approved these changes May 9, 2019

View reviewed changes

renzhengeek force-pushed the renzhen/devmapper-bugfix branch from 70e0c16 to d03417d Compare May 9, 2019 02:59

renzhengeek force-pushed the renzhen/devmapper-bugfix branch from d03417d to c560040 Compare May 9, 2019 03:32

renzhengeek force-pushed the renzhen/devmapper-bugfix branch from c560040 to 6f463d3 Compare May 9, 2019 04:27

fuweid approved these changes May 9, 2019

View reviewed changes

crosbymichael merged commit 6096fa2 into containerd:master May 9, 2019

dmcgowan mentioned this pull request Jul 12, 2019

overlayfs tests fail with Linux kernel v4.19.56 and newer #3405

Closed

Conversation

renzhengeek commented May 7, 2019

Uh oh!

renzhengeek commented May 7, 2019

Uh oh!

renzhengeek commented May 7, 2019

Uh oh!

renzhengeek commented May 7, 2019

Uh oh!

mxpv commented May 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

renzhengeek commented May 8, 2019

Uh oh!

renzhengeek commented May 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

renzhengeek commented May 8, 2019

Uh oh!

fuweid commented May 8, 2019

Uh oh!

renzhengeek commented May 8, 2019

Uh oh!

mxpv commented May 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

renzhengeek commented May 9, 2019

Uh oh!

mxpv left a comment

Choose a reason for hiding this comment

Uh oh!

renzhengeek commented May 9, 2019

Uh oh!

codecov-io commented May 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fuweid left a comment

Choose a reason for hiding this comment

Uh oh!

crosbymichael commented May 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mxpv commented May 7, 2019 •

edited

Loading

renzhengeek commented May 8, 2019 •

edited

Loading

mxpv commented May 8, 2019 •

edited

Loading

codecov-io commented May 9, 2019 •

edited

Loading