WIP RFC: erofs-snapshotter: support .erofs+{zstd|gzip} images by anniecherk · Pull Request #12506 · containerd/containerd

anniecherk · 2025-11-11T14:50:29Z

Goal:

The current implementation does not support erofs images that are compressed into .erofs+{zstd|gzip} images. Erofs itself supports compression natively, and so the current implementation has three options:

use erofs images w/out native compression (uncompressed during image pulls, uncompressed on disk), or
use erfos images w native compression (compressed during image pulls, compressed on disk), or
use overlayfs+zstd images and convert to erofs images on apply (compressed during image pulls, converted at unpack time, uncompressed on disk)

The option that would be give us the fastest startup + best runtime performance is:

using erofs images (so that we don't wait to convert from overlayfs to erofs on apply)
that are compressed when we're pulling (for fast pulls), and
decompressed on disk (for fast reads at runtime)

That corresponds to .erofs layers that are natively uncompressed, but compressed / decompressed by the diff processor.

Initial experimentation:

I created a small cli tool for converting existing overlayfs images to .erofs+zstd images. I converted some images to .erofs+zstd, then uploaded them to a registry, and then timed the pull + unpack. I saw 1.5-2x faster pulls over the "overlayfs + convert to erofs at unpack" path.

Looking for feedback:

I'm looking for some quick feedback-- does this approach make sense?

In particular, I'm curious for more context on this comment:

// Since `images.DiffCompression` doesn't support arbitrary media types,
// disallow non-empty suffixes for now.

My code changes images.DiffCompression to support this-- does that change makes sense, or is there a reason why it shouldn't support arbitrary media types?

The code here is a draft; if this approach is sound I can polish the code, run some more principled experiments & add a test.

Goal: The current implementation does not support erofs images that are compressed into .erofs+{zstd|gzip} images. Erofs itself supports compression natively, and so the current implementation has three options: - use erofs images w/out native compression (uncompressed during image pulls, uncompressed on disk), or - use erfos images w native compression (compressed during image pulls, compressed on disk), or - use overlayfs+zstd images and convert to erofs images on apply (compressed during image pulls, converted at unpack time, uncompressed on disk) The option that would be give us the fastest startup + best runtime performance is: - using erofs images (so that we don't wait to convert from overlayfs to erofs on apply) - that are compressed when we're pulling (for fast pulls), and - decompressed on disk (for fast reads at runtime) That corresponds to .erofs layers that are natively uncompressed, but compressed / decompressed by the diff processor. ----- Initial experimentation: I created a small cli tool for converting existing overlayfs images to .erofs+zstd images. I converted some images to .erofs+zstd, then uploaded them to a registry, and then timed the pull + unpack. I saw 1.5-2x faster pulls over the "overlayfs + convert to erofs at unpack" path. ----- Looking for feedback: I'm looking for some quick feedback-- does this approach make sense? In particular, I'm curious for more context on this comment: ``` // Since `images.DiffCompression` doesn't support arbitrary media types, // disallow non-empty suffixes for now. ``` My code changes images.DiffCompression to support this-- does that change makes sense, or is there a reason why it shouldn't support arbitrary media types? The code here is a draft; if this approach is sound I can polish the code, run some more principled experiments & add a test.

hsiangkao · 2025-11-11T16:47:43Z

Sorry, please ignore my previous comment.

In principle, this can be supported. However, compared to native EROFS compressed images, such a way may lack erofs native random filesystem access capability (especially since it’s already a converted, non-OCI format).
Instead, if we introduce a chunked zstd stream (so that zstd wrapper is just applied on the wire), we could potentially enable random access for .erofs+zstd as well.

The option that would be give us the fastest startup + best runtime performance is:

using erofs images (so that we don't wait to convert from overlayfs to erofs on apply)
that are compressed when we're pulling (for fast pulls), and
decompressed on disk (for fast reads at runtime)

Yes, if we consider the best runtime performance, uncompressed erofs images are preferred since there is no decompression runtime overhead and the network transformation can be optimized by zstd or other wrapper formats.

Anyway, I think there are a bunch of available options and this option can be supported since at least it can be parsed seemlessly without noticable additional code complexity I think. Also try to cc @dmcgowan

anniecherk · 2025-12-14T21:39:02Z

Hi @hsiangkao @dmcgowan

I ran a small experiment timing the pull & unpack of the latest pytorch image with this code, and wrote about the setup + results here. Quick summary is I'm seeing half the pull+unpack time relative to the overlayfs snapshotter & the erofs snapshotter doing the conversion at unpack time. In that writeup I describe the small CLI tool that I built to produce the erofs+zstd image, and it's on my todo list to clean that up and open that as a separate PR to complement this one.

Would y'all be willing to review this code & let me know (1) whether we're aligned on supporting / allowing erofs+zstd, and if so, (2) thoughts on the current implementation?

I had gotten pulled away after opening this PR about a month ago but now have lots of bandwidth for the next few weeks. I'd be excited to have this functionality in containerd & am more than happy to iterate on any feedback y'all have.

anniecherk · 2025-12-14T21:39:45Z

@hsiangkao re: chunked zstd stream, that sound like a great idea to me. Does it sound good to put in a basic implementation without chunking first and then iterate on that as a later pass? I'd be interested to work on that but would ideally like to decouple from this PR to make incremental progress.

hsiangkao · 2025-12-15T10:34:31Z

Hi @hsiangkao @dmcgowan

I ran a small experiment timing the pull & unpack of the latest pytorch image with this code, and wrote about the setup + results here. Quick summary is I'm seeing half the pull+unpack time relative to the overlayfs snapshotter & the erofs snapshotter doing the conversion at unpack time. In that writeup I describe the small CLI tool that I built to produce the erofs+zstd image, and it's on my todo list to clean that up and open that as a separate PR to complement this one.

Would y'all be willing to review this code & let me know (1) whether we're aligned on supporting / allowing erofs+zstd, and if so, (2) thoughts on the current implementation?

I had gotten pulled away after opening this PR about a month ago but now have lots of bandwidth for the next few weeks. I'd be excited to have this functionality in containerd & am more than happy to iterate on any feedback y'all have.

Personally I'm totally fine to support this feature since it doesn't introduce extra logic and benefit to AI use cases.

re: chunked zstd stream, that sound like a great idea to me. Does it sound good to put in a basic implementation without chunking first and then iterate on that as a later pass? I'd be interested to work on that but would ideally like to decouple from this PR to make incremental progress.

Fine with me.

dmcgowan · 2025-12-17T00:57:27Z

@hsiangkao re: chunked zstd stream, that sound like a great idea to me. Does it sound good to put in a basic implementation without chunking first and then iterate on that as a later pass? I'd be interested to work on that but would ideally like to decouple from this PR to make incremental progress.

I think we can be way more restrictive here and only support compressed blobs in a way we know we can handle it efficiently and with random access. If the goal is to just support transport compression, that should be done at the transport layer. In hindsight, referencing compressed tars was a mistake and we should be careful not to just copy that for consistency.

If we need compression at rest and native compression is not suitable, I would be +1 for only supporting zstd chunked. We can always add more compression support later if someone comes up with a compelling case for it, but removing support for those compressions will be difficult and it may complicate our ability to support lazy pulling.

hsiangkao · 2025-12-17T02:01:28Z

@hsiangkao re: chunked zstd stream, that sound like a great idea to me. Does it sound good to put in a basic implementation without chunking first and then iterate on that as a later pass? I'd be interested to work on that but would ideally like to decouple from this PR to make incremental progress.

I think we can be way more restrictive here and only support compressed blobs in a way we know we can handle it efficiently and with random access. If the goal is to just support transport compression, that should be done at the transport layer. In hindsight, referencing compressed tars was a mistake and we should be careful not to just copy that for consistency.

If we need compression at rest and native compression is not suitable, I would be +1 for only supporting zstd chunked. We can always add more compression support later if someone comes up with a compelling case for it, but removing support for those compressions will be difficult and it may complicate our ability to support lazy pulling.

Hi derek @dmcgowan, I think for people who care more about runtime performance might be concerned about native erofs compression (although for example, lz4 can outperform the uncompressed images in many setups, but it causes larger images on the wire since lz4 compresses less; but for zstd, lzma EROFS native compression, it has noticable runtime overhead anyway so that they might not be useful for high-performance cloud environments for example.)

I wonder if it's possible to support +zstd for now to achieve transport compression only, and if people would like to lazy pulling this, just as we expect, use zstd chunked to split the uncompressed images into 2MiB chunks for example, and dmverity can still apply to the original (uncompressed) image.

For native erofs images, it's just application/vnd.erofs.layer.overlayfs.v1.erofs for example, and containerd don't need to know any internal implementation since it justs use erofs raw blobs. -- containerd doesn't need to know anything for this setup, we could wrap them up in go-erofs for full go support.

For zstd wrappers, it seems it should be application/vnd.erofs.layer.overlayfs.v1.erofs+zstd for containerd to decompress first, like the current zstd stream processor -- as the first step, the chunked detailed format seems unnecessary to be discussed here.

hsiangkao · 2025-12-17T03:40:58Z

@anniecherk, derek @dmcgowan just mentioned another possibility: Is it possible to just enable http zstd compression for EROFS-formatted blobs in the container registry?
That way, the blob digest would still be the original erofs sha256 rather than a randomly wrapped one, zstd-compressed blobs could also be cached on the registry side, and it would also save transport bandwidth.

Does that sound like a better alternative?

anniecherk · 2025-12-17T16:37:59Z

Is it possible to just enable http zstd compression for EROFS-formatted blobs in the container registry

That's a clean solution, but unfortunately it doesn't work with our setup. Our registry redirects to an object store to serve the actual blob bytes, and the backing object store doesn't support the Accept-Encoding header or other dynamic compression requests.

Let me think through what an implementation supporting a chunked zstd stream would look like.

k8s-ci-robot · 2026-01-06T08:34:00Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

anniecherk · 2026-02-06T19:04:28Z

closing as this is now superseded by #12764

github-project-automation Bot added this to Pull Request Review Nov 11, 2025

github-project-automation Bot moved this to Needs Triage in Pull Request Review Nov 11, 2025

k8s-ci-robot added do-not-merge/work-in-progress size/M labels Nov 11, 2025

dosubot Bot added the area/distribution Image Distribution label Nov 11, 2025

This was referenced Nov 22, 2025

ctr: add EROFS image conversion support #12555

Merged

Add EROFS layer media type #12567

Merged

dmcgowan mentioned this pull request Dec 18, 2025

Add documentation for EROFS layer formats #12703

Draft

k8s-ci-robot added the needs-rebase label Jan 6, 2026

anniecherk closed this Feb 6, 2026

github-project-automation Bot moved this from Needs Triage to Done in Pull Request Review Feb 6, 2026

hsiangkao mentioned this pull request Apr 8, 2026

erofs-differ: support zstd-wrapped EROFS layers #13185

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP RFC: erofs-snapshotter: support .erofs+{zstd|gzip} images#12506

WIP RFC: erofs-snapshotter: support .erofs+{zstd|gzip} images#12506
anniecherk wants to merge 1 commit intocontainerd:mainfrom
anniecherk:ac/erofs-plus-zstd

anniecherk commented Nov 11, 2025

Uh oh!

hsiangkao commented Nov 11, 2025 •

edited

Loading

Uh oh!

anniecherk commented Dec 14, 2025

Uh oh!

anniecherk commented Dec 14, 2025

Uh oh!

hsiangkao commented Dec 15, 2025

Uh oh!

dmcgowan commented Dec 17, 2025

Uh oh!

hsiangkao commented Dec 17, 2025 •

edited

Loading

Uh oh!

hsiangkao commented Dec 17, 2025 •

edited

Loading

Uh oh!

anniecherk commented Dec 17, 2025

Uh oh!

k8s-ci-robot commented Jan 6, 2026

Uh oh!

anniecherk commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

anniecherk commented Nov 11, 2025

Uh oh!

hsiangkao commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anniecherk commented Dec 14, 2025

Uh oh!

anniecherk commented Dec 14, 2025

Uh oh!

hsiangkao commented Dec 15, 2025

Uh oh!

dmcgowan commented Dec 17, 2025

Uh oh!

hsiangkao commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsiangkao commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anniecherk commented Dec 17, 2025

Uh oh!

k8s-ci-robot commented Jan 6, 2026

Uh oh!

anniecherk commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hsiangkao commented Nov 11, 2025 •

edited

Loading

hsiangkao commented Dec 17, 2025 •

edited

Loading

hsiangkao commented Dec 17, 2025 •

edited

Loading