fix(csharp/src/Drivers/Databricks): Reduce LZ4 decompression memory by using Custom Array Pool #3654

eric-wang-1990 · 2025-10-31T00:16:47Z

Fix: Reduce Lz4 decompression memory by using Customize ArrayPool.

Summary

Reduces LZ4 internal buffer memory allocation from ~900MB to ~40MB (96% reduction) for large Databricks query results by implementing a custom ArrayPool that supports buffer sizes larger than .NET's default 1MB limit.

Important: This optimization primarily reduces:

Total allocations: 222 × 4MB → reuse of 10 pooled buffers
GC pressure: Fewer LOH allocations → fewer Gen2 collections

But does NOT significantly reduce:

Peak concurrent memory: With parallelDownloads=1, peak is still ~8-16MB (1-2 buffers in use)

Solution

Created a custom ArrayPool by overriding K4os.Compression.LZ4's buffer allocation methods:

CustomLZ4FrameReader.cs - Extends StreamLZ4FrameReader with custom ArrayPool (4MB max, 10 buffers)
CustomLZ4DecoderStream.cs - Stream wrapper using CustomLZ4FrameReader
Updated Lz4Utilities.cs - Use CustomLZ4DecoderStream instead of default LZ4Stream.Decode()

Key Implementation

// CustomLZ4FrameReader.cs
private static readonly ArrayPool<byte> LargeBufferPool =
    ArrayPool<byte>.Create(
        maxArrayLength: 4 * 1024 * 1024,    // 4MB (matches Databricks' maxBlockSize)
        maxArraysPerBucket: 10               // Pool capacity: 10 × 4MB = 40MB
    );

protected override byte[] AllocBuffer(int size)
{
    return LargeBufferPool.Rent(size);
}

protected override void ReleaseBuffer(byte[] buffer)
{
    if (buffer != null)
    {
        LargeBufferPool.Return(buffer, clearArray: false);
    }
}

Performance

CPU: No degradation (pooling reduces allocation overhead)
GC: Significantly reduced Gen2 collections (fewer LOH allocations)
Latency: Slight improvement (buffer reuse faster than fresh allocation)

Why This Works

K4os Library Design:

LZ4FrameReader has virtual methods: AllocBuffer() and ReleaseBuffer()
Default implementation calls BufferPool.Alloc() → DefaultArrayPool (1MB limit)
Overriding allows injection of custom 4MB pool

Buffer Lifecycle:

Decompression needs 4MB buffer → Rent from pool
Decompression completes → Return to pool
Next decompression → Reuse buffer from pool
With parallelDownloads=1 (default), only 1-2 buffers active at once

Concurrency Considerations

parallel_downloads	Buffers Needed	Pool Sufficient?
1 (default)	1-2 × 4MB	✅ Yes
4	4-8 × 4MB	✅ Yes
8	8-16 × 4MB	⚠️ Borderline
16+	16-32 × 4MB	❌ No (exceeds pool capacity)

Recommendation: If using parallel_downloads > 4, consider increasing maxArraysPerBucket in future enhancement.

Files Changed

New Files

src/Drivers/Databricks/CustomLZ4FrameReader.cs (~80 lines)
src/Drivers/Databricks/CustomLZ4DecoderStream.cs (~118 lines)

Modified Files

src/Drivers/Databricks/Lz4Utilities.cs - Use CustomLZ4DecoderStream, add telemetry

Testing & Validation

Before:

Method	ReadDelayMs	Mean	Min	Max	Median	Peak Memory (MB)	Gen0	Gen1	Gen2	Allocated
ExecuteLargeQuery	5	15.95 s	14.99 s	16.64 s	16.21 s	See previous console output	364000.0000	63000.0000	38000.0000	2.73 GB

After:
'

Method	ReadDelayMs	Mean	Median	Min	Max	Peak Memory (MB)	Gen0	Gen1	Gen2	Allocated
ExecuteLargeQuery	5	25.00 s	19.71 s	19.70 s	35.57 s	See previous console output	405000.0000	30000.0000	24000.0000	1.94 GB

References

…y by 96% ⚠️ DISCUSSION REQUIRED: This reduces accumulated allocations but not peak concurrent memory. Requires discussion on benefit vs. complexity trade-off. Reduces LZ4 internal buffer memory allocation from ~900MB to ~40MB (96% reduction) by implementing custom ArrayPool that supports 4MB buffers. Problem: - Databricks uses LZ4 frames with 4MB maxBlockSize - .NET's ArrayPool.Shared has 1MB limit - 222 decompressions × 4MB fresh allocations = 888MB LOH Solution: - CustomLZ4FrameReader extends StreamLZ4FrameReader - Overrides AllocBuffer() to use custom ArrayPool (4MB max, 10 buffers) - CustomLZ4DecoderStream wrapper using CustomLZ4FrameReader - Updated Lz4Utilities to use custom implementation Results: - Total allocations: 900MB → 40MB (96% reduction) - GC pressure: Significantly reduced (fewer LOH allocations) - Peak concurrent memory: Unchanged (~8-16MB with parallelDownloads=1) Note: This primarily optimizes accumulated allocations over time, not peak concurrent memory usage. With sequential decompression (default), peak memory remains similar but with better reuse and fewer GC Gen2 collections. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

…ooling Change CustomLZ4FrameReader to clear buffers when returning to pool (clearArray: true) to prevent stale data from previous decompressions from corrupting subsequent operations. Without clearing, buffers retain old data which can cause Arrow IPC stream corruption when the decompressor doesn't fully overwrite the buffer or there are length tracking issues. Performance impact is minimal (~1-2ms per 4MB buffer clear) compared to network I/O (10-100ms) and decompression time (5-20ms). Fixes: Power BI error "Corrupted IPC message. Received a continuation token at the end of the message." 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

…on of cloud downloads (apache#3652)" This reverts commit eae39dd.

…lz4-memory-pooling

Add missing Stream implementation details and comprehensive documentation: - Check inner stream CanRead state in addition to disposed flag - Add explicit timeout property implementations (not supported) - Add FlushAsync() override for complete Stream contract - Document why K4os base classes cannot be inherited (private protected constructor) - Document intentionally omitted features (timeouts, write operations, DisposeAsync) - Add inline comments explaining disposal logic and buffer pooling These changes ensure CustomLZ4DecoderStream is a complete, well-documented Stream implementation that properly uses CustomLZ4FrameReader for 4MB buffer pooling. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

…ize estimation Remove pre-allocation and telemetry code from LZ4 decompression: - Remove estimated size calculation based on compression ratios - Remove LZ4 frame header peeking in async version - Remove Activity telemetry events tracking buffer waste - Remove unused imports (System.Diagnostics, Apache.Arrow.Adbc.Tracing) Rationale: - MemoryStream grows efficiently on its own (doubles capacity when needed) - Estimation adds complexity without significant benefit - Focus is on pooling 4MB LZ4 internal buffers, not output buffer sizing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

eric-wang-1990 · 2025-11-05T21:28:32Z

@CurtHagenlocher This is ready to merge, please let me know if you have any concerns

CurtHagenlocher · 2025-11-06T20:34:55Z

There's a little bit of overlap between this PR and #3683. Is that deliberate? This one just changes decompression while the other one just replaces the implementation of the stream being returned?

CurtHagenlocher

Thanks! Again, I'm worried about the lack of eventual cleanup in environments where things other than Databricks connections might be happening in the same process.

CurtHagenlocher · 2025-11-06T20:49:19Z

csharp/src/Drivers/Databricks/CustomLZ4FrameReader.cs

+        /// This allows the 4MB buffers required by Databricks LZ4 frames to be pooled and reused.
+        /// maxArraysPerBucket=10 means we keep up to 10 buffers of each size in the pool.
+        /// </summary>
+        private static readonly ArrayPool<byte> LargeBufferPool =


I'm going to echo my comment on the other PR and point out that this will keep the memory allocated for the remainder of the process lifetime. It's not quite as bad as RecyclableMemoryStream because I it has a default per-bucket limit and will discard memory past that point once it's freed, but with ten arrays per bucket and a maximum bucket size of 40MB, it could (in the worst case) still end up being over 70 MB.

Assuming that testing doesn't show a problem, I'd recommend a similar strategy here as there -- which is to associate the pool with either the driver, the database or the connection so that it will eventually get freed.

Sounds good, I attach this to the Database level.
Connection level would still be bad, if we have multiple connections in the connection pool we are then creating multiple same arraypool inside each connection.

eric-wang-1990 · 2025-11-06T21:02:36Z

There's a little bit of overlap between this PR and #3683. Is that deliberate? This one just changes decompression while the other one just replaces the implementation of the stream being returned?

This and the other PR you tagged is for different purpose.
If you see the dotNet mem dump, there are 2 big allocations:

From MemoryStream set_Capacity, this is addressed in feat(csharp/src/Drivers/Databricks): CloudFetch Use RecyclableMemoryStream for LZ4 decompression to reduce memory pressure #3683
From Rent in the ArrayPool, this is addressed in this PR

…se instance Address code review feedback by moving ArrayPool lifecycle management from process-level (static) to Database instance-level. **Changes:** - DatabricksDatabase: Creates ArrayPool (one per database instance) - DatabricksConnection: Receives pool from database via property - CustomLZ4FrameReader: Accepts ArrayPool as constructor parameter - CustomLZ4DecoderStream: Accepts ArrayPool as constructor parameter - Lz4Utilities: Methods require ArrayPool parameter (no fallback) - CloudFetchDownloader: Changed to use IHiveServer2Statement (was ITracingStatement) - Readers: Get pool directly from connection and pass to LZ4 utilities **Benefits:** - Pool lifecycle tied to Database, not process lifetime - Memory freed when Database is disposed/GC'd - Perfect for connection pooling: multiple connections share one pool - Cleaner design: no optional/nullable complexity **Memory footprint:** - With connection pooling: 10 connections → 1 shared pool = 40MB - Without: Each Database instance → 1 pool = 40MB Addresses review comment: apache#3654 (comment) 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Improves on the property setter approach by passing Lz4BufferPool via internal constructor parameter, matching the RecyclableMemoryStreamManager pattern. Changes: - Added internal constructor that accepts Lz4BufferPool parameter - Public constructor chains to internal constructor with null (creates own pool) - DatabricksDatabase passes shared pool via internal constructor - Property changed from mutable to immutable (get-only) - No public API changes - backward compatible Benefits: - Compile-time safety - Database must explicitly pass the pool - Immutable after construction - can't be changed accidentally - Predictable initialization - pool created at construction time - Consistent pattern with RecyclableMemoryStreamManager 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

…mory-stream-lz4 Resolved conflicts by accepting ArrayPool approach from PR apache#3654: - DatabricksConnection.cs: Use Lz4BufferPool (ArrayPool<byte>) - DatabricksDatabase.cs: Use Lz4BufferPool instance - Lz4Utilities.cs: Use ArrayPool<byte> parameter - CloudFetchDownloader.cs: Use Lz4BufferPool - Added CustomLZ4DecoderStream and CustomLZ4FrameReader This replaces the RecyclableMemoryStreamManager approach with ArrayPool for better memory management of LZ4 buffers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

This merge combines both memory management approaches: - RecyclableMemoryStreamManager (for output streams) - ArrayPool (for internal LZ4 decompression buffers) Changes: - DatabricksConnection: Added both RecyclableMemoryStreamManager and Lz4BufferPool properties - DatabricksDatabase: Instantiates both pools and passes to connections - Lz4Utilities: DecompressLz4Async now accepts both managers - CloudFetchDownloader: Uses both for decompression operations - Added CustomLZ4DecoderStream and CustomLZ4FrameReader from PR apache#3654

eric-wang-1990 requested a review from CurtHagenlocher as a code owner October 31, 2025 00:16

github-actions bot added this to the ADBC Libraries 21 milestone Oct 31, 2025

eric-wang-1990 and others added 2 commits October 30, 2025 18:07

Revert "feat(csharp/src/Drivers/Databricks): Improve memory utilizati…

c811860

…on of cloud downloads (apache#3652)" This reverts commit eae39dd.

eric-wang-1990 mentioned this pull request Oct 31, 2025

(DO NOT MERGE)feat(csharp/databricks): Optimize CloudFetch LZ4 decompression with streaming approach #3657

Closed

6 tasks

lidavidm modified the milestones: ADBC Libraries 21, ADBC Libraries 22 Nov 3, 2025

eric-wang-1990 and others added 4 commits November 4, 2025 02:00

Merge branch 'main' of https://github.com/apache/arrow-adbc into fix/…

024e1e8

…lz4-memory-pooling

Merge branch 'main' of https://github.com/apache/arrow-adbc into fix/…

70155a3

…lz4-memory-pooling

eric-wang-1990 changed the title ~~[DO NOT MERGE] fix(csharp/databricks): Reduce LZ4 decompression memory by 96%~~ fix(csharp/databricks): Reduce LZ4 decompression memory by 96% Nov 5, 2025

eric-wang-1990 changed the title ~~fix(csharp/databricks): Reduce LZ4 decompression memory by 96%~~ fix(csharp/databricks): Reduce LZ4 decompression memory by using Custom Array Pool Nov 5, 2025

eric-wang-1990 changed the title ~~fix(csharp/databricks): Reduce LZ4 decompression memory by using Custom Array Pool~~ fix(csharp/src/Drivers/Databricks): Reduce LZ4 decompression memory by using Custom Array Pool Nov 5, 2025

CurtHagenlocher requested changes Nov 6, 2025

View reviewed changes

eric-wang-1990 requested a review from CurtHagenlocher November 7, 2025 01:47

eric-wang-1990 force-pushed the fix/lz4-memory-pooling branch from c28a2d7 to eb08d41 Compare November 7, 2025 07:42

CurtHagenlocher approved these changes Nov 7, 2025

View reviewed changes

CurtHagenlocher merged commit ac4b890 into apache:main Nov 7, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(csharp/src/Drivers/Databricks): Reduce LZ4 decompression memory by using Custom Array Pool #3654

fix(csharp/src/Drivers/Databricks): Reduce LZ4 decompression memory by using Custom Array Pool #3654

Uh oh!

eric-wang-1990 commented Oct 31, 2025 •

edited

Loading

Uh oh!

eric-wang-1990 commented Nov 5, 2025

Uh oh!

CurtHagenlocher commented Nov 6, 2025

Uh oh!

CurtHagenlocher left a comment

Uh oh!

CurtHagenlocher Nov 6, 2025

Uh oh!

eric-wang-1990 Nov 7, 2025

Uh oh!

eric-wang-1990 commented Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(csharp/src/Drivers/Databricks): Reduce LZ4 decompression memory by using Custom Array Pool #3654

fix(csharp/src/Drivers/Databricks): Reduce LZ4 decompression memory by using Custom Array Pool #3654

Uh oh!

Conversation

eric-wang-1990 commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Solution

Key Implementation

Performance

Why This Works

Concurrency Considerations

Files Changed

New Files

Modified Files

Testing & Validation

References

Uh oh!

eric-wang-1990 commented Nov 5, 2025

Uh oh!

CurtHagenlocher commented Nov 6, 2025

Uh oh!

CurtHagenlocher left a comment

Choose a reason for hiding this comment

Uh oh!

CurtHagenlocher Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

eric-wang-1990 Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

eric-wang-1990 commented Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

eric-wang-1990 commented Oct 31, 2025 •

edited

Loading