Skip to content

Do not remove containers from memory on error#31012

Merged
cpuguy83 merged 1 commit intomoby:masterfrom
cpuguy83:do_not_remove_containers_on_error
May 10, 2017
Merged

Do not remove containers from memory on error#31012
cpuguy83 merged 1 commit intomoby:masterfrom
cpuguy83:do_not_remove_containers_on_error

Conversation

@cpuguy83
Copy link
Copy Markdown
Member

Before this, if forceRemove is set the container data will be removed
no matter what, including if there are issues with removing containeron-disk state (rw layer, container root).

In practice this causes a lot of issues with leaked data sitting on
disk that users are not able to clean up themselves.
This is particularly a problem while the EBUSY errors on remove are so
prevalent. So for now let's not keep this behavior.

@cpuguy83
Copy link
Copy Markdown
Member Author

Pretty interesting that tests are getting stuck waiting for a delete event, seems it must be hitting an error on remove...

@cpuguy83 cpuguy83 force-pushed the do_not_remove_containers_on_error branch 3 times, most recently from 938f073 to 55294dd Compare February 17, 2017 22:04
Comment thread pkg/system/rm.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else return err? Why try again if no action was taken

Comment thread pkg/system/rm.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we confident this is always caused by a race? If it isn't, this could loop forever...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could track paths in the error like I'm doing for ebusy.

Comment thread pkg/system/rm.go Outdated
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also figured I would mention I added this only after CI was failing (waiting for a destroy event that never comes)

@cpuguy83 cpuguy83 force-pushed the do_not_remove_containers_on_error branch 2 times, most recently from f47d9e1 to 6197d92 Compare February 21, 2017 22:34
@cpuguy83
Copy link
Copy Markdown
Member Author

Ugh, I know what the problem here is...
Container removal is failing in one or more of the containers created for the events test (due to EBUSY on the container's shm mount), as such the destroy event is never sent to the client which just sits waiting for an event that will never come.

@cpuguy83 cpuguy83 force-pushed the do_not_remove_containers_on_error branch 4 times, most recently from 6f5feae to b647f29 Compare February 23, 2017 18:34
@LK4D4
Copy link
Copy Markdown
Contributor

LK4D4 commented Mar 2, 2017

@cpuguy83 any news here?

@cpuguy83 cpuguy83 force-pushed the do_not_remove_containers_on_error branch 9 times, most recently from 8c6f990 to d6ffbc6 Compare March 4, 2017 20:07
@cpuguy83
Copy link
Copy Markdown
Member Author

cpuguy83 commented Mar 4, 2017

Ok, I have this "working".
In order for tests to pass I had to beef up autoremove to retry a few times + a backoff delay. This is because mounts seem to be leaking in the test system. I've implemented this retry with a
The situation as it is today we just leak containers if the autoremove fails.

Now this is a perfectly valid issue to have since older kernels (< 3.15) can run into this as well, and cause the client to hang if it's not able to resolve within the max retry attempts.
The reason it works for tests is because the tests are specifically spinning up a bunch of short-lived containers in parallel causing the leakage and the retry just gives enough time for things to get cleaned up after the containers are all stopped.

One potential thing I've thought about for dealing with the client hangs (for legitimate issues) is to have a timeout period waiting for the destroy event to occur after the die event comes. Open to other suggestions.

No matter what we do to fix the client issue, 1.13/17.03 clients will have a problem unless we emit a destroy event. We could fix this situation with a minor patch to fix this bug in cases where a destroy event is not emitted.

@tonistiigi
Copy link
Copy Markdown
Member

LGTM

@cpuguy83 cpuguy83 force-pushed the do_not_remove_containers_on_error branch from 098f284 to 7266dd4 Compare May 4, 2017 00:00
@cpuguy83
Copy link
Copy Markdown
Member Author

cpuguy83 commented May 4, 2017

Squashed commits.

@thaJeztah thaJeztah added this to the 17.06.0 milestone May 4, 2017
@cpuguy83 cpuguy83 requested a review from justincormack May 5, 2017 11:37
Before this, if `forceRemove` is set the container data will be removed
no matter what, including if there are issues with removing container
on-disk state (rw layer, container root).

In practice this causes a lot of issues with leaked data sitting on
disk that users are not able to clean up themselves.
This is particularly a problem while the `EBUSY` errors on remove are so
prevalent. So for now let's not keep this behavior.

Signed-off-by: Brian Goff <[email protected]>
@cpuguy83 cpuguy83 force-pushed the do_not_remove_containers_on_error branch from 7266dd4 to 54dcbab Compare May 5, 2017 21:02
@thaJeztah thaJeztah added the status/needs-attention Calls for a collective discussion during a review session label May 9, 2017
@dmcgowan
Copy link
Copy Markdown
Member

dmcgowan commented May 9, 2017

LGTM

Copy link
Copy Markdown
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🦁

@vdemeester vdemeester added status/4-merge and removed status/2-code-review status/needs-attention Calls for a collective discussion during a review session labels May 10, 2017
@cpuguy83 cpuguy83 merged commit 815e8bb into moby:master May 10, 2017
@cpuguy83 cpuguy83 deleted the do_not_remove_containers_on_error branch May 10, 2017 10:28
@thaJeztah thaJeztah removed this from the 17.06.0 milestone Jul 4, 2017
liusdu pushed a commit to liusdu/moby that referenced this pull request Oct 30, 2017
Before this, if `forceRemove` is set the container data will be removed
no matter what, including if there are issues with removing container
on-disk state (rw layer, container root).

In practice this causes a lot of issues with leaked data sitting on
disk that users are not able to clean up themselves.
This is particularly a problem while the `EBUSY` errors on remove are so
prevalent. So for now let's not keep this behavior.

Cherry-pick from moby#31012

Signed-off-by: Brian Goff <[email protected]>
Signed-off-by: Wentao Zhang <[email protected]>
(cherry picked from commit 54dcbab)

Conflicts:
	daemon/delete.go
	daemon/graphdriver/aufs/aufs.go
	daemon/graphdriver/btrfs/btrfs.go
	daemon/graphdriver/devmapper/driver.go
	daemon/graphdriver/overlay/overlay.go
	pkg/mount/mount.go
liusdu pushed a commit to liusdu/moby that referenced this pull request Oct 30, 2017
Do not remove containers from memory on error

Before this, if `forceRemove` is set the container data will be removed<br>
no matter what, including if there are issues with removing container<br>
on-disk state (rw layer, container root).   

<br>

In practice this causes a lot of issues with leaked data sitting on<br>
disk that users are not able to clean up themselves.

<br>
This is particularly a problem while the `EBUSY` errors on remove are so<br>
prevalent. So for now let's not keep this behavior.     

<br>

Cherry-pick from <a href="moby#31012" rel="nofollow noreferrer" target="_blank">https://github.com/moby/moby/pull/31012</a>

<br>

Signed-off-by: Brian Goff <a href="mailto:[email protected]">[email protected]</a><br>
Signed-off-by: Wentao Zhang <a href="mailto:[email protected]">[email protected]</a><br>
(cherry picked from commit <a href="/docker/docker/commit/54dcbab25ea4771da303fa95e0c26f2d39487b49" data-original="54dcbab2" data-link="false" data-project="7713" data-commit="54dcbab25ea4771da303fa95e0c26f2d39487b49" data-reference-type="commit" title="" class="gfm gfm-commit has-tooltip" data-original-title="Do not remove containers from memory on error">54dcbab2</a>)     

<br>

Conflicts:<br>
    daemon/delete.go<br>
    daemon/graphdriver/aufs/aufs.go<br>
    daemon/graphdriver/btrfs/btrfs.go<br>
    daemon/graphdriver/devmapper/driver.go<br>
    daemon/graphdriver/overlay/overlay.go<br>
    daemon/monitor.go<br>
    pkg/mount/mount.go



See merge request docker/docker!519
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants