Parallel Container Layer Unpacking #8881

ike-ma · 2023-07-26T22:27:13Z

What is the problem you're trying to solve

The current ContainerD fetches layers in parallel, but unpacks them in a single thread layer by layer sequentially.

Describe the solution you'd like

Proposal:

We propose a config below in ContainerD (/etc/containerd/config.toml) to support unpacking images in parallel.

[plugins."io.containerd.grpc.v1.cri".containerd.overlayfs]
  unpacking_mode = "parallel"

Key Changes: ContainerD Handlers + Content

This option reuses the existing FetchHandler and Content to support pre-decompression during the fetch phase of each layer. The actual decompression happens directly following the fetch process as the graph below. Then the unpack or snapshot handling becomes a light operation by just renaming the pre-decompression folder to the desired overlay path.

Fetch
- When ContainerD config (/etc/containerd/config.toml) unpacking_mode = "parallel" is set, replace FetchHandler in pull.go with FetchUnzipHandler created below.
- Create a new FetchUnzipHandler similar to FetchHandler, which enables the unzip option in Fetch.
- Add unzip option to Fetch function; Update Content to support pre-decompress buckets as part of its content.Store APIs for content lifecycle management, i.e., A fetched gzip file and corresponding pre-decompressed fs folder will have the identical lifecycle, for example, create and delete.
Unpack and Snapshot processing
- Update apply function to support an option to apply with the pre-decompression folder.
- Update s.store.ReaderAt implementation, the content store knows if there is a pre-decompress bucket for a layer, if yes, it will return a list of file paths as byte[]. Alternatively, maybe we will add a new method s.store.FolderAt to Provider.

Additional context

Who should enable this feature?

Those who use disks with high in-parallel IO support. For example, PD or LocalSSD is designed for a deeper IO queue to get better throughput by in-parallel IO operations.
Those who use large containers and are sensitive to slow pod cold start. For example, containers with GPU libraries and frameworks (> 4GB). All GPU workloads can be put into this category. In contrast, previously top containers without using GPU were significantly smaller (< 500MB).

Who should NOT enable this feature?

HDD with high seek time for random read and write.

Potential Benefit

If user can improve disk performance, the image pull latency by this proposal can be reduced significantly. For example, Tao was able to achieve 3X faster image pull (120 seconds -> 40 seconds) for pulling a popular container: gcr.io/deeplearning-platform-release/base-cu113:m106 (5.4 GB) with a common setup of a deep learning node (2500 GB PD-SSD, 32 vCPUs).

The text was updated successfully, but these errors were encountered:

ike-ma · 2023-07-26T22:31:14Z

/cc @elfinhe
/cc @bobbypage
/cc @qiutongs
/cc @samuelkarp

ike-ma · 2023-07-26T22:32:08Z

/assign @ike-ma

samuelkarp · 2023-07-27T00:23:06Z

I'm not quite back from leave, but some context here:

The sequential layer unpack is a consequence of snapshot creation requiring a committed parent, and an individual snapshot not being committed until all of its content has been written. For the overlay snapshotter, the backing filesystem does not have a concept of a "committed" lowerdir, and containerd unpacks without an active overlay mount anyway (opting to write whiteout markers explicitly). Because the overlay filesystem does not have a dependency on a committed parent-child relationship, we can implement a further optimization of writing out the actual snapshot in parallel. For some storage devices (such as the PD-SSD device tested above), there are performance gains from concurrent IO over sequential IO and this approach leads to faster overall image pull times.

dmcgowan · 2023-07-27T21:30:49Z

We have also discussed in the past of having a "rebase" function on snapshotter. Such a function would be very lightweight in the overlay snapshotter since it would just be updating the parent field. The rebase could possibly be performed on commit so that unpacks could just occur without parents and then commit the snapshots in order with the appropriate parent.

Ideally content store would not gain new functionality dealing with any processed content. Snapshotters might have more room for optimization functionality specific to a single snapshotter, as we already have that today on unpack.

cookieisaac · 2023-10-11T18:51:34Z

Thanks @dmcgowan for the comment. I have prepared a draft PR: #9138

Wonder if you have any high level comments for the first version, where the uncompressed layers are consumed directly during Apply with some basic config wiring.

ike-ma added the kind/feature label Jul 26, 2023

k8s-ci-robot assigned ike-ma Jul 26, 2023

ike-ma mentioned this issue Sep 25, 2023

[WIP] Preunpack right after downloading #9138

Closed

This was referenced Nov 14, 2023

EROFS snapshotter & differ #9361

Closed

[Feat] erofs snapshotter and differ #9362

Closed

dosubot bot added the Stale label Aug 4, 2024

hsiangkao mentioned this issue Sep 23, 2024

Erofs snapshotter and differ #10705

Merged

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 22, 2024

dosubot bot removed the Stale label Dec 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel Container Layer Unpacking #8881

Parallel Container Layer Unpacking #8881

ike-ma commented Jul 26, 2023

ike-ma commented Jul 26, 2023

ike-ma commented Jul 26, 2023

samuelkarp commented Jul 27, 2023

dmcgowan commented Jul 27, 2023

cookieisaac commented Oct 11, 2023

Parallel Container Layer Unpacking #8881

Parallel Container Layer Unpacking #8881

Comments

ike-ma commented Jul 26, 2023

What is the problem you're trying to solve

Describe the solution you'd like

Proposal:

Key Changes: ContainerD Handlers + Content

Additional context

Who should enable this feature?

Who should NOT enable this feature?

Potential Benefit

ike-ma commented Jul 26, 2023

ike-ma commented Jul 26, 2023

samuelkarp commented Jul 27, 2023

dmcgowan commented Jul 27, 2023

cookieisaac commented Oct 11, 2023