avoid repeated decompression and further utilize the --GH option to improve conversion speed by escapefreeg · Pull Request #2145 · containerd/stargz-snapshotter

escapefreeg · 2025-10-13T08:05:50Z

During the current image conversion process, the converted blob is decompressed in estargz/build.go to calculate its hash value. Additionally, the converted blob is decompressed again in nativeconverter/estargz/estargz.go (estargz format) and nativeconverter/zstdchunked/zstdchunked.go (zstd format) to calculate its size. However, these two decompression operations are not necessary, and decompression is a time-consuming task. Therefore, the two decompressions can be merged into a single operation to improve image conversion speed.

Furthermore, according to #2117 , compared with Go’s built-in gzip decompression library, command-line tools can achieve faster decompression of gzip archives. When the --estargz-gzip-helper (--GH) option is specified, the gzip helper can be further utilized to speed up decompression of the converted image.

I' conducted tests on the related changes. Results from 10 images show that the improvement reduces conversion time by an average of 26.7%. For more details, see:

The calculation method for Time improvement from applying current optimization on pigz GH is (With pigz GH - With pigz GH and current optimization) / With pigz GH and the calculation method for Time improvement from combining pigz GH and current optimization is (Without GH - With pigz GH and current optimization) / Without GH

Note: Although ioutils.CountWriter ensures correct byte counting in concurrent environments, its usage in nativeconverter/estargz/estargz.go and nativeconverter/zstdchunked/zstdchunked.go does not involve concurrency. Therefore, the size of the converted blob can be directly obtained from the return value of io.Copy.

ktock · 2025-10-16T05:19:49Z

estargz/build.go

+		uncompressedSizeChan <- uncompressedSize
+		close(uncompressedSizeChan)


Does this make UncompressedSize return a zero value when it called more than once? And I guess this coroutine leaks with blocking at L295 if UncompressedSize isn't called.
Instead of using channel, I think it can just use an int64 pointer + atomic. UncompressedSize can have a comment that the value is valid only after the full read.

Thanks for the suggestion. I've updated it based on your advice.

…peed up conversion Signed-off-by: clarehkli <[email protected]>

ktock

Thanks

escapefreeg force-pushed the main branch 3 times, most recently from 210513c to c8413f3 Compare October 14, 2025 02:59

ktock reviewed Oct 16, 2025

View reviewed changes

avoid repeated decompression and further utilize the --GH option to s…

62acf05

…peed up conversion Signed-off-by: clarehkli <[email protected]>

escapefreeg force-pushed the main branch from c8413f3 to 62acf05 Compare October 17, 2025 08:22

ktock approved these changes Oct 18, 2025

View reviewed changes

ktock merged commit 73ac8ff into containerd:main Oct 18, 2025
44 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avoid repeated decompression and further utilize the --GH option to improve conversion speed#2145

avoid repeated decompression and further utilize the --GH option to improve conversion speed#2145
ktock merged 1 commit intocontainerd:mainfrom
escapefreeg:main

escapefreeg commented Oct 13, 2025

Uh oh!

ktock Oct 16, 2025

Uh oh!

escapefreeg Oct 17, 2025

Uh oh!

ktock left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		uncompressedSizeChan <- uncompressedSize
		close(uncompressedSizeChan)

Conversation

escapefreeg commented Oct 13, 2025

Uh oh!

ktock Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

escapefreeg Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

ktock left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants