WIP RFC: erofs-snapshotter: support .erofs+{zstd|gzip} images#12506
WIP RFC: erofs-snapshotter: support .erofs+{zstd|gzip} images#12506anniecherk wants to merge 1 commit intocontainerd:mainfrom
Conversation
Goal:
The current implementation does not support erofs images that are compressed into .erofs+{zstd|gzip} images. Erofs itself supports compression natively, and so the current implementation has three options:
- use erofs images w/out native compression (uncompressed during image pulls, uncompressed on disk), or
- use erfos images w native compression (compressed during image pulls, compressed on disk), or
- use overlayfs+zstd images and convert to erofs images on apply (compressed during image pulls, converted at unpack time, uncompressed on disk)
The option that would be give us the fastest startup + best runtime performance is:
- using erofs images (so that we don't wait to convert from overlayfs to erofs on apply)
- that are compressed when we're pulling (for fast pulls), and
- decompressed on disk (for fast reads at runtime)
That corresponds to .erofs layers that are natively uncompressed, but compressed / decompressed by the diff processor.
-----
Initial experimentation:
I created a small cli tool for converting existing overlayfs images to .erofs+zstd
images. I converted some images to .erofs+zstd, then uploaded them to a registry, and then timed the pull + unpack. I saw 1.5-2x faster pulls over the "overlayfs + convert to erofs at unpack" path.
-----
Looking for feedback:
I'm looking for some quick feedback-- does this approach make sense?
In particular, I'm curious for more context on this comment:
```
// Since `images.DiffCompression` doesn't support arbitrary media types,
// disallow non-empty suffixes for now.
```
My code changes images.DiffCompression to support this-- does that change makes sense, or is there a reason why it shouldn't support arbitrary media types?
The code here is a draft; if this approach is sound I can polish the code, run some more principled experiments & add a test.
|
Sorry, please ignore my previous comment. In principle, this can be supported. However, compared to native EROFS compressed images, such a way may lack erofs native random filesystem access capability (especially since it’s already a converted, non-OCI format). Yes, if we consider the best runtime performance, uncompressed erofs images are preferred since there is no decompression runtime overhead and the network transformation can be optimized by zstd or other wrapper formats. Anyway, I think there are a bunch of available options and this option can be supported since at least it can be parsed seemlessly without noticable additional code complexity I think. Also try to cc @dmcgowan |
|
I ran a small experiment timing the pull & unpack of the latest pytorch image with this code, and wrote about the setup + results here. Quick summary is I'm seeing half the pull+unpack time relative to the overlayfs snapshotter & the erofs snapshotter doing the conversion at unpack time. In that writeup I describe the small CLI tool that I built to produce the erofs+zstd image, and it's on my todo list to clean that up and open that as a separate PR to complement this one. Would y'all be willing to review this code & let me know (1) whether we're aligned on supporting / allowing erofs+zstd, and if so, (2) thoughts on the current implementation? I had gotten pulled away after opening this PR about a month ago but now have lots of bandwidth for the next few weeks. I'd be excited to have this functionality in containerd & am more than happy to iterate on any feedback y'all have. |
|
@hsiangkao re: chunked zstd stream, that sound like a great idea to me. Does it sound good to put in a basic implementation without chunking first and then iterate on that as a later pass? I'd be interested to work on that but would ideally like to decouple from this PR to make incremental progress. |
Personally I'm totally fine to support this feature since it doesn't introduce extra logic and benefit to AI use cases.
Fine with me. |
I think we can be way more restrictive here and only support compressed blobs in a way we know we can handle it efficiently and with random access. If the goal is to just support transport compression, that should be done at the transport layer. In hindsight, referencing compressed tars was a mistake and we should be careful not to just copy that for consistency. If we need compression at rest and native compression is not suitable, I would be +1 for only supporting zstd chunked. We can always add more compression support later if someone comes up with a compelling case for it, but removing support for those compressions will be difficult and it may complicate our ability to support lazy pulling. |
Hi derek @dmcgowan, I think for people who care more about runtime performance might be concerned about native erofs compression (although for example, lz4 can outperform the uncompressed images in many setups, but it causes larger images on the wire since lz4 compresses less; but for zstd, lzma EROFS native compression, it has noticable runtime overhead anyway so that they might not be useful for high-performance cloud environments for example.) I wonder if it's possible to support +zstd for now to achieve transport compression only, and if people would like to lazy pulling this, just as we expect, use zstd chunked to split the uncompressed images into 2MiB chunks for example, and dmverity can still apply to the original (uncompressed) image. For native erofs images, it's just For zstd wrappers, it seems it should be |
|
@anniecherk, derek @dmcgowan just mentioned another possibility: Is it possible to just enable http zstd compression for EROFS-formatted blobs in the container registry? Does that sound like a better alternative? |
That's a clean solution, but unfortunately it doesn't work with our setup. Our registry redirects to an object store to serve the actual blob bytes, and the backing object store doesn't support the Accept-Encoding header or other dynamic compression requests. Let me think through what an implementation supporting a chunked zstd stream would look like. |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
closing as this is now superseded by #12764 |
Goal:
The current implementation does not support erofs images that are compressed into .erofs+{zstd|gzip} images. Erofs itself supports compression natively, and so the current implementation has three options:
The option that would be give us the fastest startup + best runtime performance is:
That corresponds to .erofs layers that are natively uncompressed, but compressed / decompressed by the diff processor.
Initial experimentation:
I created a small cli tool for converting existing overlayfs images to .erofs+zstd images. I converted some images to .erofs+zstd, then uploaded them to a registry, and then timed the pull + unpack. I saw 1.5-2x faster pulls over the "overlayfs + convert to erofs at unpack" path.
Looking for feedback:
I'm looking for some quick feedback-- does this approach make sense?
In particular, I'm curious for more context on this comment:
My code changes images.DiffCompression to support this-- does that change makes sense, or is there a reason why it shouldn't support arbitrary media types?
The code here is a draft; if this approach is sound I can polish the code, run some more principled experiments & add a test.