Skip to content

Conversation

@eric-wang-1990
Copy link
Contributor

Summary

This PR consolidates LZ4 decompression code paths and ensures proper resource cleanup across both CloudFetch and non-CloudFetch readers in the Databricks C# driver.

Changes

  • Lz4Utilities.cs

    • Add configurable buffer size parameter to DecompressLz4()
    • Add async variant DecompressLz4Async() for CloudFetch pipeline
    • Add proper using statements for MemoryStream disposal
    • Add default buffer size constant (80KB)
  • CloudFetchDownloader.cs

    • Update to use shared Lz4Utilities.DecompressLz4Async()
    • Reduce code duplication (~12 lines consolidated)
    • Improve telemetry with compression ratio calculation

Benefits

  • Code Quality: Both code paths now share the same decompression implementation, reducing duplication
  • Resource Management: Explicit MemoryStream disposal improves memory hygiene (though GC would handle cleanup)
  • Maintainability: Single source of truth for LZ4 decompression logic
  • Consistency: Same error handling and telemetry patterns across both paths

Technical Details

  • Default buffer size remains 80KB (81920 bytes) - no behavioral changes
  • Async version returns (byte[] buffer, int length) tuple for efficient MemoryStream wrapping in CloudFetch
  • Buffer validity preserved after MemoryStream disposal via reference counting
  • Maintains cancellation token support in async path
  • No performance impact - purely refactoring and cleanup

Testing

  • Existing unit tests pass
  • No functional changes to decompression logic
  • Telemetry output remains consistent

🤖 Generated with Claude Code

…gic and improve resource disposal

This commit consolidates LZ4 decompression code paths and ensures proper resource cleanup across both CloudFetch and non-CloudFetch readers.

**Changes:**
- Add configurable buffer size parameter to `Lz4Utilities.DecompressLz4()`
- Add async variant `DecompressLz4Async()` for CloudFetch pipeline
- Update CloudFetchDownloader to use shared `Lz4Utilities.DecompressLz4Async()`
- Add proper `using` statements for MemoryStream disposal

**Benefits:**
- Reduces code duplication (~12 lines consolidated in CloudFetchDownloader)
- Both code paths now share the same decompression implementation
- Improves memory hygiene with explicit MemoryStream disposal
- Maintains cancellation token support in async path
- Preserves existing telemetry and performance characteristics

**Technical Details:**
- Default buffer size remains 80KB (81920 bytes)
- Async version returns `(byte[] buffer, int length)` tuple for efficient MemoryStream wrapping
- Buffer validity preserved after MemoryStream disposal via reference counting
- No behavioral changes - purely refactoring and cleanup

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…with default buffer size

Add a convenience overload of DecompressLz4Async that uses the default buffer size,
matching the pattern of the synchronous DecompressLz4 method. This improves code
consistency and simplifies call sites.

**Changes:**
- Add `DecompressLz4Async(byte[], CancellationToken)` overload that delegates to the full version with DefaultBufferSize
- Update CloudFetchDownloader to use the simpler overload, removing hardcoded buffer size

**Benefits:**
- API symmetry between sync and async methods
- Cleaner call sites (no need to specify buffer size when default is acceptable)
- Maintains same functionality and performance

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@CurtHagenlocher
Copy link
Contributor

Take a look at #3650; if you keep the downloaded buffer as a ReadOnlyMemory<byte> then it simplifies the conversion to an ArrowArrayStream and this should result in some small wins for both CPU and memory utilization.

@CurtHagenlocher CurtHagenlocher merged commit 8a861bb into apache:main Oct 29, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants