Fix leaking task resources when nodes are deleted#2806
Merged
dperny merged 1 commit intomoby:masterfrom Mar 21, 2019
Merged
Conversation
When a node is deleted, its tasks are asked to restart, which involves putting them into a desired state of Shutdown. However, the Allocator will not deallocate a task which is not in an actual state of a terminal state. Once a node is deleted, the only opportunity for its tasks to recieve updates and be moved to a terminal state is when the function moving those tasks to TaskStateOrphaned is called, 24 hours after the node enters the Down state. However, if a leadership change occurs, then that function will never be called, and the tasks will never be moved to a terminal state, leaking resources. With this change, upon node deletion, all of its tasks will be moved to TaskStateOrphaned, allowing those tasks' resources to be cleaned up. Signed-off-by: Drew Erny <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #2806 +/- ##
==========================================
+ Coverage 61.93% 61.94% +0.01%
==========================================
Files 137 137
Lines 22131 22132 +1
==========================================
+ Hits 13706 13710 +4
+ Misses 6944 6940 -4
- Partials 1481 1482 +1 |
anshulpundir
suggested changes
Jan 17, 2019
Contributor
anshulpundir
left a comment
There was a problem hiding this comment.
Are there any other implications of deleting tasks right away instead of waiting for 24hrs?
| // should only be 3 tasks left | ||
| require.Len(t, tasks, 4) | ||
| // and the list should not contain task1 | ||
| // and the list should not contain task1 or task2 |
Contributor
There was a problem hiding this comment.
if the task1 and task2 should have been removed, why are we testing for those tasks below?
| // Now, call the function with our nodeID. make sure it returns no error | ||
| err = ts.Store.Update(func(tx store.Tx) error { | ||
| return removeNodeAttachments(tx, "id2") | ||
| return orphanNodeTasks(tx, "id2") |
Contributor
There was a problem hiding this comment.
the orphaned tasks are deleted by the task reaper, correct? Can we also test for task deletions?
This was referenced Mar 26, 2019
thaJeztah
added a commit
to thaJeztah/docker
that referenced
this pull request
Oct 7, 2019
full diff: moby/swarmkit@7dded76...a8bbe7d changes included: - moby/swarmkit#2867 Only update non-terminal tasks on node removal - related to moby/swarmkit#2806 Fix leaking task resources when nodes are deleted - moby/swarmkit#2880 Bump to golang 1.12.9 - moby/swarmkit#2886 Bump vendoring to match current docker/docker master - regenerates protobufs - moby/swarmkit#2890 Remove hardcoded IPAM config subnet value for ingress network - fixes [ENGORC-2651] Specifying --default-addr-pool for docker swarm init is not picked up by ingress network Signed-off-by: Sebastiaan van Stijn <[email protected]>
thaJeztah
added a commit
to thaJeztah/docker
that referenced
this pull request
Oct 21, 2019
full diff: moby/swarmkit@7dded76...a8bbe7d changes included: - moby/swarmkit#2867 Only update non-terminal tasks on node removal - related to moby/swarmkit#2806 Fix leaking task resources when nodes are deleted - moby/swarmkit#2880 Bump to golang 1.12.9 - moby/swarmkit#2886 Bump vendoring to match current docker/docker master - regenerates protobufs - moby/swarmkit#2890 Remove hardcoded IPAM config subnet value for ingress network - fixes [ENGORC-2651] Specifying --default-addr-pool for docker swarm init is not picked up by ingress network Signed-off-by: Sebastiaan van Stijn <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Signed-off-by: Drew Erny [email protected]
- What I did
When a node is deleted, its tasks are asked to restart, which involves putting them into a desired state of Shutdown. However, the Allocator will not deallocate a task which is not in an actual state of a terminal state. Once a node is deleted, the only opportunity for its tasks to recieve updates and be moved to a terminal state is when the function moving those tasks to
TaskStateOrphanedis called, 24 hours after the node enters the Down state. However, if a leadership change occurs, then that function will never be called, and the tasks will never be moved to a terminal state, leaking resources.With this change, upon node deletion, all of its tasks will be moved to
TaskStateOrphaned, allowing those tasks' resources to be cleaned up.- How I did it
Altered the controlapi to move all tasks belonging to the node to the actual state of Orphaned.
- How to test it
Includes a unit test. However, testing for the full failure condition is nontrivial and likely not worth the effort.
- Description for the changelog
Fix bug where deleting nodes could result in task resources leaking.