chore(csharp/src/Drivers/Databricks): Remove one Stream from the download equation #3650

CurtHagenlocher · 2025-10-29T20:34:16Z

For illustration purposes.

CurtHagenlocher · 2025-10-29T20:40:23Z

csharp/src/Drivers/Databricks/Reader/CloudFetch/DownloadResult.cs


        /// <inheritdoc />
-        public void SetCompleted(Stream dataStream, long size)
+        public void SetCompleted(ReadOnlyMemory<byte> data, long size)


With this change, we probably don't need to take size separately anymore. That said, if any of these uncompressed buffers are more than 2 GB in size, things are going to break in general due to CLR limitations. That's a much larger topic.

For cloudfetch each file should be around 20MB uncompressed.
Are you going to check in this BTW?

Boy, I sure got the size thing wrong. Anyhow, I need to find the time to benchmark this to confirm that the numbers go in the right direction. I've created a new PR as #3652.

…gic and improve resource disposal (apache#3649) ## Summary This PR consolidates LZ4 decompression code paths and ensures proper resource cleanup across both CloudFetch and non-CloudFetch readers in the Databricks C# driver. ## Changes - **Lz4Utilities.cs** - Add configurable buffer size parameter to `DecompressLz4()` - Add async variant `DecompressLz4Async()` for CloudFetch pipeline - Add proper `using` statements for MemoryStream disposal - Add default buffer size constant (80KB) - **CloudFetchDownloader.cs** - Update to use shared `Lz4Utilities.DecompressLz4Async()` - Reduce code duplication (~12 lines consolidated) - Improve telemetry with compression ratio calculation ## Benefits - **Code Quality**: Both code paths now share the same decompression implementation, reducing duplication - **Resource Management**: Explicit MemoryStream disposal improves memory hygiene (though GC would handle cleanup) - **Maintainability**: Single source of truth for LZ4 decompression logic - **Consistency**: Same error handling and telemetry patterns across both paths ## Technical Details - Default buffer size remains 80KB (81920 bytes) - no behavioral changes - Async version returns `(byte[] buffer, int length)` tuple for efficient MemoryStream wrapping in CloudFetch - Buffer validity preserved after MemoryStream disposal via reference counting - Maintains cancellation token support in async path - No performance impact - purely refactoring and cleanup ## Testing - Existing unit tests pass - No functional changes to decompression logic - Telemetry output remains consistent 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <[email protected]>

Expands support for arrow to include the latest version 57. Also, the minor version of datafusion specified in the lock file has been updated. Supersede apache#3634.

CurtHagenlocher · 2025-10-30T02:58:37Z

Bah, I botched the rebase yet again. Will resubmit.

Remove one Stream from the download equation.

e450c87

CurtHagenlocher mentioned this pull request Oct 29, 2025

feat(csharp/src/Drivers/Databricks): consolidate LZ4 decompression logic and improve resource disposal #3649

Merged

CurtHagenlocher commented Oct 29, 2025

View reviewed changes

eric-wang-1990 and others added 5 commits October 29, 2025 13:42

chore(rust): support arrow 57 (apache#3647)

f6f8e49

Expands support for arrow to include the latest version 57. Also, the minor version of datafusion specified in the lock file has been updated. Supersede apache#3634.

Remove one Stream from the download equation.

fcf0345

Updated

7b6dc3f

Merge from main

e232776

CurtHagenlocher closed this Oct 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(csharp/src/Drivers/Databricks): Remove one Stream from the download equation #3650

chore(csharp/src/Drivers/Databricks): Remove one Stream from the download equation #3650

Uh oh!

CurtHagenlocher commented Oct 29, 2025 •

edited

Loading

Uh oh!

CurtHagenlocher Oct 29, 2025 •

edited

Loading

Uh oh!

eric-wang-1990 Oct 29, 2025

Uh oh!

CurtHagenlocher Oct 30, 2025

Uh oh!

CurtHagenlocher commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chore(csharp/src/Drivers/Databricks): Remove one Stream from the download equation #3650

chore(csharp/src/Drivers/Databricks): Remove one Stream from the download equation #3650

Uh oh!

Conversation

CurtHagenlocher commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CurtHagenlocher Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wang-1990 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

CurtHagenlocher Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

CurtHagenlocher commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CurtHagenlocher commented Oct 29, 2025 •

edited

Loading

CurtHagenlocher Oct 29, 2025 •

edited

Loading