Skip to content

[release/1.7] fix: ImagePull should close http connection if there is no available data to read.#9409

Merged
samuelkarp merged 3 commits intocontainerd:release/1.7from
ruiwen-zhao:progress-fix-1.7
Nov 22, 2023
Merged

[release/1.7] fix: ImagePull should close http connection if there is no available data to read.#9409
samuelkarp merged 3 commits intocontainerd:release/1.7from
ruiwen-zhao:progress-fix-1.7

Conversation

@ruiwen-zhao
Copy link
Copy Markdown
Member

@ruiwen-zhao ruiwen-zhao commented Nov 21, 2023

The new active request is filed and there is no bytes read yet when the
progress reporter just wakes up. If the timeout / 2 is less than the
minPullProgressReportInternal, it's easy to file false alert.

We should remove the minPullProgressReportInternal limit.

Fixes: containerd#8024

Signed-off-by: Wei Fu <[email protected]>
Close connection if no more data. It's to fix false alert filed by image
pull progress.

```
dst = OpenWriter (--> Content Store)

src = Fetch
        Open (--> Registry)
        Mark it as active request

Copy(dst, src) (--> Keep updating total received bytes)

   ^
   |  (Active Request > 0, but total received bytes won't be updated)
   v

defer src.Close()
content.Commit(dst)
```

Before migrating to transfer service, CRI plugin doesn't limit global
concurrent downloads for ImagePulls. Each ImagePull requests have 3 concurrent
goroutines to download blob and 1 goroutine to unpack blob. Like ext4
filesystem [1][1], the fsync from content.Commit may sync unrelated dirty pages
into disk. The host is running under IO pressure, and then the content.Commit
will take long time and block other goroutines. If httpreadseeker
doesn't close the connection after io.EOF, this connection will be
considered as active. The pull progress reporter reports there is no
bytes transfered and cancels the ImagePull.

The original 1-minute timeout[2][2] is from kubelet settting. Since CRI-plugin
can't limit the total concurrent downloads, this patch is to update 1-minute
to 5-minutes to prevent from unexpected cancel.

[1]: https://lwn.net/Articles/842385/
[2]: https://github.com/kubernetes/kubernetes/blob/release-1.23/pkg/kubelet/config/flags.go#L45-L48

Signed-off-by: Wei Fu <[email protected]>
@ruiwen-zhao
Copy link
Copy Markdown
Member Author

cc @samuelkarp @qiutongs

@samuelkarp samuelkarp added the area/cri Container Runtime Interface (CRI) label Nov 21, 2023
@ruiwen-zhao
Copy link
Copy Markdown
Member Author

cc @fuweid @henry118

Copy link
Copy Markdown
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
thanks!

@samuelkarp samuelkarp merged commit 21b85e9 into containerd:release/1.7 Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cri Container Runtime Interface (CRI) size/L

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants