feat(csharp/src/Drivers/Databricks): consolidate LZ4 decompression logic and improve resource disposal #3649

eric-wang-1990 · 2025-10-29T19:38:57Z

Summary

This PR consolidates LZ4 decompression code paths and ensures proper resource cleanup across both CloudFetch and non-CloudFetch readers in the Databricks C# driver.

Changes

Lz4Utilities.cs
- Add configurable buffer size parameter to DecompressLz4()
- Add async variant DecompressLz4Async() for CloudFetch pipeline
- Add proper using statements for MemoryStream disposal
- Add default buffer size constant (80KB)
CloudFetchDownloader.cs
- Update to use shared Lz4Utilities.DecompressLz4Async()
- Reduce code duplication (~12 lines consolidated)
- Improve telemetry with compression ratio calculation

Benefits

Code Quality: Both code paths now share the same decompression implementation, reducing duplication
Resource Management: Explicit MemoryStream disposal improves memory hygiene (though GC would handle cleanup)
Maintainability: Single source of truth for LZ4 decompression logic
Consistency: Same error handling and telemetry patterns across both paths

Technical Details

Default buffer size remains 80KB (81920 bytes) - no behavioral changes
Async version returns (byte[] buffer, int length) tuple for efficient MemoryStream wrapping in CloudFetch
Buffer validity preserved after MemoryStream disposal via reference counting
Maintains cancellation token support in async path
No performance impact - purely refactoring and cleanup

Testing

Existing unit tests pass
No functional changes to decompression logic
Telemetry output remains consistent

🤖 Generated with Claude Code

…gic and improve resource disposal This commit consolidates LZ4 decompression code paths and ensures proper resource cleanup across both CloudFetch and non-CloudFetch readers. **Changes:** - Add configurable buffer size parameter to `Lz4Utilities.DecompressLz4()` - Add async variant `DecompressLz4Async()` for CloudFetch pipeline - Update CloudFetchDownloader to use shared `Lz4Utilities.DecompressLz4Async()` - Add proper `using` statements for MemoryStream disposal **Benefits:** - Reduces code duplication (~12 lines consolidated in CloudFetchDownloader) - Both code paths now share the same decompression implementation - Improves memory hygiene with explicit MemoryStream disposal - Maintains cancellation token support in async path - Preserves existing telemetry and performance characteristics **Technical Details:** - Default buffer size remains 80KB (81920 bytes) - Async version returns `(byte[] buffer, int length)` tuple for efficient MemoryStream wrapping - Buffer validity preserved after MemoryStream disposal via reference counting - No behavioral changes - purely refactoring and cleanup 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

…with default buffer size Add a convenience overload of DecompressLz4Async that uses the default buffer size, matching the pattern of the synchronous DecompressLz4 method. This improves code consistency and simplifies call sites. **Changes:** - Add `DecompressLz4Async(byte[], CancellationToken)` overload that delegates to the full version with DefaultBufferSize - Update CloudFetchDownloader to use the simpler overload, removing hardcoded buffer size **Benefits:** - API symmetry between sync and async methods - Cleaner call sites (no need to specify buffer size when default is acceptable) - Maintains same functionality and performance 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

CurtHagenlocher · 2025-10-29T20:37:37Z

Take a look at #3650; if you keep the downloaded buffer as a ReadOnlyMemory<byte> then it simplifies the conversion to an ArrowArrayStream and this should result in some small wins for both CPU and memory utilization.

eric-wang-1990 requested a review from CurtHagenlocher as a code owner October 29, 2025 19:38

github-actions bot added this to the ADBC Libraries 21 milestone Oct 29, 2025

CurtHagenlocher approved these changes Oct 29, 2025

View reviewed changes

CurtHagenlocher merged commit 8a861bb into apache:main Oct 29, 2025
6 checks passed

eric-wang-1990 mentioned this pull request Nov 18, 2025

feat(csharp): migrate arrow-adbc improvements (#3649, #3656) adbc-drivers/databricks#45

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(csharp/src/Drivers/Databricks): consolidate LZ4 decompression logic and improve resource disposal #3649

feat(csharp/src/Drivers/Databricks): consolidate LZ4 decompression logic and improve resource disposal #3649

Uh oh!

eric-wang-1990 commented Oct 29, 2025

Uh oh!

CurtHagenlocher commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(csharp/src/Drivers/Databricks): consolidate LZ4 decompression logic and improve resource disposal #3649

feat(csharp/src/Drivers/Databricks): consolidate LZ4 decompression logic and improve resource disposal #3649

Uh oh!

Conversation

eric-wang-1990 commented Oct 29, 2025

Summary

Changes

Benefits

Technical Details

Testing

Uh oh!

CurtHagenlocher commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants