Skip to content

Avoid deadlock in unpacker.#3825

Merged
estesp merged 1 commit intocontainerd:masterfrom
Random-Liu:fix-unpacker
Nov 18, 2019
Merged

Avoid deadlock in unpacker.#3825
estesp merged 1 commit intocontainerd:masterfrom
Random-Liu:fix-unpacker

Conversation

@Random-Liu
Copy link
Copy Markdown
Member

@Random-Liu Random-Liu commented Nov 13, 2019

Fixes #3816.

It turns out that some layers are downloaded with error response.status="206 Partial Content". However, we don't stop unpacker when there are layer download errors, thus it will wait for those layers forever.

This PR:

  1. Stoped the unpacker if any error happens during fetch;
  2. Avoided sending update when unpacker is stopped, this can prevent another potential deadlock when updateCh is full.

/cc @dmcgowan @ungureanuvladvictor

Signed-off-by: Lantao Liu [email protected]

Comment thread unpacker.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this cause problems if there are multiple unpacks going on?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this then? :)

Please note that if the unpacker returns without error, the context won't be cancelled, in theory the updateCh could still be full if there are 128 more unknown layers. :)

Comment thread unpacker.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of returning the whole error group, maybe return a function which will wait then cancel?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eg itself does wait and cancel.

We need cancel and wait when fetch returns error.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have been more clear. I agree you need to explicitly cancel on error. However cancel must always be run, and if it hasn't been canceled, then do so after wait (or just always do after wait since double cancellation is fine)

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Nov 13, 2019

Build succeeded.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Nov 13, 2019

Codecov Report

Merging #3825 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3825   +/-   ##
=======================================
  Coverage   41.98%   41.98%           
=======================================
  Files         130      130           
  Lines       14577    14577           
=======================================
  Hits         6120     6120           
  Misses       7543     7543           
  Partials      914      914
Flag Coverage Δ
#linux 45.42% <ø> (ø) ⬆️
#windows 36.94% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec661e8...c560591. Read the comment docs.

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Nov 13, 2019

Build succeeded.

@crosbymichael
Copy link
Copy Markdown
Member

LGTM

Copy link
Copy Markdown
Member

@dmcgowan dmcgowan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous comment was collapsed, the cancel needs to be updated to always run. I think having a single function which does cancel and wait is good, the cancel can still be called inside as well as in the first revision.

@Random-Liu
Copy link
Copy Markdown
Member Author

My previous comment was collapsed, the cancel needs to be updated to always run. I think having a single function which does cancel and wait is good, the cancel can still be called inside as well as in the first revision.

How about the new version. :)

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Nov 15, 2019

Build succeeded.

Copy link
Copy Markdown
Member

@dmcgowan dmcgowan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@estesp estesp merged commit 2e29387 into containerd:master Nov 18, 2019
@Random-Liu Random-Liu deleted the fix-unpacker branch November 19, 2019 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deadlock in pulling/unpacking image with containerd 1.3.0

6 participants