Skip to content

[WIP] Add span-based Deflate, ZLib and GZip encoder/decoder APIs#123145

Draft
iremyux wants to merge 51 commits intodotnet:mainfrom
iremyux:62113-zlib-encoder-decoder
Draft

[WIP] Add span-based Deflate, ZLib and GZip encoder/decoder APIs#123145
iremyux wants to merge 51 commits intodotnet:mainfrom
iremyux:62113-zlib-encoder-decoder

Conversation

@iremyux
Copy link
Contributor

@iremyux iremyux commented Jan 13, 2026

This PR introduces new span-based, streamless compression and decompression APIs for Deflate, ZLib, and GZip formats, matching the existing BrotliEncoder/BrotliDecoder pattern.

New APIs

  • DeflateEncoder / DeflateDecoder
  • ZLibEncoder / ZLibDecoder
  • GZipEncoder / GZipDecoder

These classes provide:

  • Instance-based API for streaming/chunked compression with Compress(), Decompress(), and Flush()
  • Static one-shot API via TryCompress() and TryDecompress() for simple scenarios
  • GetMaxCompressedLength() to calculate buffer sizes

Closes #62113
Closes #39327
Closes #44793

/// <returns>One of the enumeration values that describes the status with which the operation finished.</returns>
public OperationStatus Flush(Span<byte> destination, out int bytesWritten)
{
return Compress(ReadOnlySpan<byte>.Empty, destination, out _, out bytesWritten, isFinalBlock: false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this force writing output (if available), I think this should lead to FlushCode.SyncFlush to the native API

/// <param name="source">A read-only span of bytes containing the source data to compress.</param>
/// <param name="destination">When this method returns, a span of bytes where the compressed data is stored.</param>
/// <param name="bytesWritten">When this method returns, the total number of bytes that were written to <paramref name="destination"/>.</param>
/// <param name="compressionLevel">A number representing compression level. -1 is default, 0 is no compression, 1 is best speed, 9 is best compression.</param>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be more clear which default we mean.

Suggested change
/// <param name="compressionLevel">A number representing compression level. -1 is default, 0 is no compression, 1 is best speed, 9 is best compression.</param>
/// <param name="compressionLevel">A number representing compression level. -1 means implementation default, 0 is no compression, 1 is best speed, 9 is best compression.</param>

@iremyux iremyux changed the title [WIP] Add span-based ZlibEncoder and ZlibDecoder APIs [WIP] Add span-based Deflate, ZLib and GZip encoder/decoder APIs Jan 19, 2026
CompressionLevel.Fastest => ZLibNative.CompressionLevel.BestSpeed,
CompressionLevel.NoCompression => ZLibNative.CompressionLevel.NoCompression,
CompressionLevel.SmallestSize => ZLibNative.CompressionLevel.BestCompression,
_ => throw new ArgumentOutOfRangeException(nameof(compressionLevel)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would fail on valid native compression levels not covered by the CompressionLevel enum. Instead I think it should check if the value is is < -1 or > 9 to throw out of range instead.

Copy link
Member

@AraHaan AraHaan Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also to add on to the above, now those who want compression levels that just happen to == a value in the CompressionLevel enum will now not be able to use those compression levels either. Perhaps a solution to this is to expose a version of the ctor with CompressionLevel and a version with int that gets casted to ZLibNative.CompressionLevel after a range check.

Copilot AI review requested due to automatic review settings February 18, 2026 11:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 2 comments.

Comment on lines +81 to +85
// GZip has a larger header than raw deflate, so add extra overhead
long baseLength = DeflateEncoder.GetMaxCompressedLength(inputLength);

// GZip adds 18 bytes: 10-byte header + 8-byte trailer (CRC32 + original size)
return baseLength + 18;
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetMaxCompressedLength calculation adds excessive overhead. The compressBound() function used by DeflateEncoder.GetMaxCompressedLength already includes 6 bytes for zlib format wrapper (2-byte header + 4-byte Adler32) in its 13-byte constant overhead. For GZip format with an 18-byte wrapper (10-byte header + 8-byte trailer), only 12 additional bytes are needed (18 - 6 = 12), not 18.

Change line 85 to: return baseLength + 12;

While the current overestimation is safe and doesn't cause correctness issues, it wastes memory by requesting buffers that are 6 bytes larger than necessary.

Suggested change
// GZip has a larger header than raw deflate, so add extra overhead
long baseLength = DeflateEncoder.GetMaxCompressedLength(inputLength);
// GZip adds 18 bytes: 10-byte header + 8-byte trailer (CRC32 + original size)
return baseLength + 18;
// GZip has a larger wrapper than the zlib wrapper accounted for by DeflateEncoder, so add the additional overhead
long baseLength = DeflateEncoder.GetMaxCompressedLength(inputLength);
// GZip wrapper is 18 bytes (10-byte header + 8-byte trailer), zlib wrapper is 6 bytes, so add the 12-byte difference
return baseLength + 12;

Copilot uses AI. Check for mistakes.
Comment on lines 58 to 64
ArgumentNullException.ThrowIfNull(compressionOptions);

// Compute windowBits for raw deflate format: negative windowLog
int windowLog = compressionOptions.WindowLog == -1 ? DeflateEncoder.DefaultWindowLog : compressionOptions.WindowLog;
int windowBits = -windowLog;

ArgumentNullException.ThrowIfNull(stream);
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter validation order could be optimized. Currently, compressionOptions is validated and used to compute windowBits before stream is validated for null. If stream is null, the windowBits computation is wasted work.

Consider reordering to validate both parameters first:

ArgumentNullException.ThrowIfNull(stream);
ArgumentNullException.ThrowIfNull(compressionOptions);

int windowLog = compressionOptions.WindowLog == -1 ? DeflateEncoder.DefaultWindowLog : compressionOptions.WindowLog;
int windowBits = -windowLog;

InitializeDeflater(stream, ...);

This matches the pattern in the internal constructor at line 68-74 and fails fast with minimal wasted work.

Suggested change
ArgumentNullException.ThrowIfNull(compressionOptions);
// Compute windowBits for raw deflate format: negative windowLog
int windowLog = compressionOptions.WindowLog == -1 ? DeflateEncoder.DefaultWindowLog : compressionOptions.WindowLog;
int windowBits = -windowLog;
ArgumentNullException.ThrowIfNull(stream);
ArgumentNullException.ThrowIfNull(stream);
ArgumentNullException.ThrowIfNull(compressionOptions);
// Compute windowBits for raw deflate format: negative windowLog
int windowLog = compressionOptions.WindowLog == -1 ? DeflateEncoder.DefaultWindowLog : compressionOptions.WindowLog;
int windowBits = -windowLog;

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 25, 2026 12:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 5 comments.

internal static unsafe partial uint crc32(uint crc, byte* buffer, int len);

[LibraryImport(Libraries.CompressionNative, EntryPoint = "CompressionNative_CompressBound")]
internal static partial uint compressBound(uint sourceLen);
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interop signature for compressBound uses uint for both input and return. zlib-ng's compressBound takes/returns z_uintmax_t (potentially 64-bit), so this P/Invoke can truncate the bound on 64-bit platforms. Consider switching to nuint/ulong (matching ZSTD_compressBound patterns) together with a native export that returns a pointer-sized/64-bit value.

Suggested change
internal static partial uint compressBound(uint sourceLen);
internal static partial nuint compressBound(nuint sourceLen);

Copilot uses AI. Check for mistakes.
Comment on lines +154 to +158
/// <exception cref="ArgumentOutOfRangeException"><paramref name="inputLength"/> is negative or exceeds <see cref="uint.MaxValue"/>.</exception>
public static long GetMaxCompressedLength(long inputLength)
{
ArgumentOutOfRangeException.ThrowIfNegative(inputLength);
ArgumentOutOfRangeException.ThrowIfGreaterThan(inputLength, uint.MaxValue);
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetMaxCompressedLength allows inputLength up to uint.MaxValue, but the underlying compressBound() can produce a bound larger than 4GiB for large inputs. With the current uint-returning interop this can wrap/truncate and return a value that is not an upper bound. Either change the interop/native export to return a 64-bit value, or further restrict inputLength so the computed bound is guaranteed to fit in the chosen return type.

Suggested change
/// <exception cref="ArgumentOutOfRangeException"><paramref name="inputLength"/> is negative or exceeds <see cref="uint.MaxValue"/>.</exception>
public static long GetMaxCompressedLength(long inputLength)
{
ArgumentOutOfRangeException.ThrowIfNegative(inputLength);
ArgumentOutOfRangeException.ThrowIfGreaterThan(inputLength, uint.MaxValue);
/// <exception cref="ArgumentOutOfRangeException"><paramref name="inputLength"/> is negative or exceeds <see cref="int.MaxValue"/>.</exception>
public static long GetMaxCompressedLength(long inputLength)
{
ArgumentOutOfRangeException.ThrowIfNegative(inputLength);
ArgumentOutOfRangeException.ThrowIfGreaterThan(inputLength, int.MaxValue);

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +75
/// <summary>
/// Gets or sets the base-2 logarithm of the window size for a compression stream.
/// </summary>
/// <exception cref="ArgumentOutOfRangeException">The value is less than -1 or greater than 15, or between 0 and 7.</exception>
/// <remarks>
/// Can accept -1 or any value between 8 and 15 (inclusive). Larger values result in better compression at the expense of memory usage.
/// -1 requests the default window log which is currently equivalent to 15 (32KB window). The default value is -1.
/// </remarks>
public int WindowLog
{
get => _windowLog;
set
{
if (value != -1)
{
ArgumentOutOfRangeException.ThrowIfLessThan(value, ZLibNative.MinWindowLog);
ArgumentOutOfRangeException.ThrowIfGreaterThan(value, ZLibNative.MaxWindowLog);
}

_windowLog = value;
}
}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZLibCompressionOptions.WindowLog is a new public configuration knob, but there are no corresponding unit tests validating default value, accepted range (-1, 8-15), and expected ArgumentOutOfRangeException parameter name for invalid values. Adding coverage alongside the existing CompressionLevel/CompressionStrategy option tests would help prevent regressions.

Copilot uses AI. Check for mistakes.

// 0xFF is an invalid first byte for all three formats:
// - GZip requires magic bytes 0x1F 0x8B
// - ZLib requires a valid CMF byte (0x78 for deflate with window size)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment implies ZLib requires CMF byte 0x78, but valid CMF values vary with window size (0x78 is just the common 32KiB-window case). Consider rewording to state that the first byte must be a valid CMF for deflate+window size, without implying a single fixed value.

Suggested change
// - ZLib requires a valid CMF byte (0x78 for deflate with window size)
// - ZLib requires the first byte (CMF) to indicate deflate with a supported window size; 0xFF is not a valid CMF value

Copilot uses AI. Check for mistakes.
Comment on lines +677 to +681
encoder.Compress(input, compressed, out _, out int compressedSize, isFinalBlock: true);

byte[] decompressed = new byte[input.Length];
using var decoder = CreateDecoder();
decoder.Decompress(compressed.AsSpan(0, compressedSize), decompressed, out _, out _);
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RoundTrip_AllCompressionLevels doesn't assert the OperationStatus (or bytesConsumed/bytesWritten) from Compress/Decompress. If these calls start returning DestinationTooSmall/NeedMoreData/InvalidData, the test failure will be less actionable (or could miss partial-progress scenarios). Consider asserting status == Done, consumed == input.Length, and written == input.Length for each iteration.

Suggested change
encoder.Compress(input, compressed, out _, out int compressedSize, isFinalBlock: true);
byte[] decompressed = new byte[input.Length];
using var decoder = CreateDecoder();
decoder.Decompress(compressed.AsSpan(0, compressedSize), decompressed, out _, out _);
OperationStatus compressStatus = encoder.Compress(input, compressed, out int bytesConsumed, out int compressedSize, isFinalBlock: true);
Assert.Equal(OperationStatus.Done, compressStatus);
Assert.Equal(input.Length, bytesConsumed);
byte[] decompressed = new byte[input.Length];
using var decoder = CreateDecoder();
OperationStatus decompressStatus = decoder.Decompress(compressed.AsSpan(0, compressedSize), decompressed, out int decompressedBytesConsumed, out int decompressedBytesWritten);
Assert.Equal(OperationStatus.Done, decompressStatus);
Assert.Equal(compressedSize, decompressedBytesConsumed);
Assert.Equal(input.Length, decompressedBytesWritten);

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[API Proposal]: Add Deflate, ZLib and GZip encoder/decoder APIs Add static compression helper methods Span-based (non-stream) compression APIs

4 participants