Skip to content

Commit 2068061

Browse files
fuweidruiwen-zhao
authored andcommitted
remotes/docker: close connection if no more data
Close connection if no more data. It's to fix false alert filed by image pull progress. ``` dst = OpenWriter (--> Content Store) src = Fetch Open (--> Registry) Mark it as active request Copy(dst, src) (--> Keep updating total received bytes) ^ | (Active Request > 0, but total received bytes won't be updated) v defer src.Close() content.Commit(dst) ``` Before migrating to transfer service, CRI plugin doesn't limit global concurrent downloads for ImagePulls. Each ImagePull requests have 3 concurrent goroutines to download blob and 1 goroutine to unpack blob. Like ext4 filesystem [1][1], the fsync from content.Commit may sync unrelated dirty pages into disk. The host is running under IO pressure, and then the content.Commit will take long time and block other goroutines. If httpreadseeker doesn't close the connection after io.EOF, this connection will be considered as active. The pull progress reporter reports there is no bytes transfered and cancels the ImagePull. The original 1-minute timeout[2][2] is from kubelet settting. Since CRI-plugin can't limit the total concurrent downloads, this patch is to update 1-minute to 5-minutes to prevent from unexpected cancel. [1]: https://lwn.net/Articles/842385/ [2]: https://github.com/kubernetes/kubernetes/blob/release-1.23/pkg/kubelet/config/flags.go#L45-L48 Signed-off-by: Wei Fu <[email protected]>
1 parent 3284939 commit 2068061

5 files changed

Lines changed: 38 additions & 7 deletions

File tree

pkg/cri/config/config.go

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,30 @@ import (
2828
"github.com/containerd/containerd/plugin"
2929
)
3030

31+
const (
32+
// defaultImagePullProgressTimeoutDuration is the default value of imagePullProgressTimeout.
33+
//
34+
// NOTE:
35+
//
36+
// This ImagePullProgressTimeout feature is ported from kubelet/dockershim's
37+
// --image-pull-progress-deadline. The original value is 1m0. Unlike docker
38+
// daemon, the containerd doesn't have global concurrent download limitation
39+
// before migrating to Transfer Service. If kubelet runs with concurrent
40+
// image pull, the node will run under IO pressure. The ImagePull process
41+
// could be impacted by self, if the target image is large one with a
42+
// lot of layers. And also both container's writable layers and image's storage
43+
// share one disk. The ImagePull process commits blob to content store
44+
// with fsync, which might bring the unrelated files' dirty pages into
45+
// disk in one transaction [1]. The 1m0 value isn't good enough. Based
46+
// on #9347 case and kubernetes community's usage [2], the default value
47+
// is updated to 5m0. If end-user still runs into unexpected cancel,
48+
// they need to config it based on their environment.
49+
//
50+
// [1]: Fast commits for ext4 - https://lwn.net/Articles/842385/
51+
// [2]: https://github.com/kubernetes/kubernetes/blob/1635c380b26a1d8cc25d36e9feace9797f4bae3c/cluster/gce/util.sh#L882
52+
defaultImagePullProgressTimeoutDuration = 5 * time.Minute
53+
)
54+
3155
type SandboxControllerMode string
3256

3357
const (

pkg/cri/config/config_unix.go

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,6 @@
1919
package config
2020

2121
import (
22-
"time"
23-
2422
"github.com/containerd/containerd"
2523
"github.com/containerd/containerd/pkg/cri/streaming"
2624
"github.com/pelletier/go-toml"
@@ -109,7 +107,7 @@ func DefaultConfig() PluginConfig {
109107
},
110108
EnableCDI: false,
111109
CDISpecDirs: []string{"/etc/cdi", "/var/run/cdi"},
112-
ImagePullProgressTimeout: time.Minute.String(),
110+
ImagePullProgressTimeout: defaultImagePullProgressTimeoutDuration.String(),
113111
DrainExecSyncIOTimeout: "0s",
114112
}
115113
}

pkg/cri/config/config_windows.go

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@ package config
1919
import (
2020
"os"
2121
"path/filepath"
22-
"time"
2322

2423
"github.com/containerd/containerd"
2524
"github.com/containerd/containerd/pkg/cri/streaming"
@@ -85,7 +84,7 @@ func DefaultConfig() PluginConfig {
8584
ImageDecryption: ImageDecryption{
8685
KeyModel: KeyModelNode,
8786
},
88-
ImagePullProgressTimeout: time.Minute.String(),
87+
ImagePullProgressTimeout: defaultImagePullProgressTimeoutDuration.String(),
8988
DrainExecSyncIOTimeout: "0s",
9089
}
9190
}

pkg/cri/server/image_pull.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -637,9 +637,9 @@ func (reporter *pullProgressReporter) start(ctx context.Context) {
637637
WithField("activeReqs", activeReqs).
638638
WithField("totalBytesRead", bytesRead).
639639
WithField("lastSeenBytesRead", lastSeenBytesRead).
640-
WithField("lastSeenTimestamp", lastSeenTimestamp).
640+
WithField("lastSeenTimestamp", lastSeenTimestamp.Format(time.RFC3339)).
641641
WithField("reportInterval", reportInterval).
642-
Tracef("progress for image pull")
642+
Debugf("progress for image pull")
643643

644644
if activeReqs == 0 || bytesRead > lastSeenBytesRead {
645645
lastSeenBytesRead = bytesRead

remotes/docker/httpreadseeker.go

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,16 @@ func (hrs *httpReadSeeker) Read(p []byte) (n int, err error) {
7676
if _, err2 := hrs.reader(); err2 == nil {
7777
return n, nil
7878
}
79+
} else if err == io.EOF {
80+
// The CRI's imagePullProgressTimeout relies on responseBody.Close to
81+
// update the process monitor's status. If the err is io.EOF, close
82+
// the connection since there is no more available data.
83+
if hrs.rc != nil {
84+
if clsErr := hrs.rc.Close(); clsErr != nil {
85+
log.L.WithError(clsErr).Error("httpReadSeeker: failed to close ReadCloser after io.EOF")
86+
}
87+
hrs.rc = nil
88+
}
7989
}
8090
return
8191
}

0 commit comments

Comments
 (0)