-
Notifications
You must be signed in to change notification settings - Fork 173
chore(csharp/src/Drivers/Databricks): Remove one Stream from the download equation #3650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| /// <inheritdoc /> | ||
| public void SetCompleted(Stream dataStream, long size) | ||
| public void SetCompleted(ReadOnlyMemory<byte> data, long size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change, we probably don't need to take size separately anymore. That said, if any of these uncompressed buffers are more than 2 GB in size, things are going to break in general due to CLR limitations. That's a much larger topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For cloudfetch each file should be around 20MB uncompressed.
Are you going to check in this BTW?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boy, I sure got the size thing wrong. Anyhow, I need to find the time to benchmark this to confirm that the numbers go in the right direction. I've created a new PR as #3652.
…gic and improve resource disposal (apache#3649) ## Summary This PR consolidates LZ4 decompression code paths and ensures proper resource cleanup across both CloudFetch and non-CloudFetch readers in the Databricks C# driver. ## Changes - **Lz4Utilities.cs** - Add configurable buffer size parameter to `DecompressLz4()` - Add async variant `DecompressLz4Async()` for CloudFetch pipeline - Add proper `using` statements for MemoryStream disposal - Add default buffer size constant (80KB) - **CloudFetchDownloader.cs** - Update to use shared `Lz4Utilities.DecompressLz4Async()` - Reduce code duplication (~12 lines consolidated) - Improve telemetry with compression ratio calculation ## Benefits - **Code Quality**: Both code paths now share the same decompression implementation, reducing duplication - **Resource Management**: Explicit MemoryStream disposal improves memory hygiene (though GC would handle cleanup) - **Maintainability**: Single source of truth for LZ4 decompression logic - **Consistency**: Same error handling and telemetry patterns across both paths ## Technical Details - Default buffer size remains 80KB (81920 bytes) - no behavioral changes - Async version returns `(byte[] buffer, int length)` tuple for efficient MemoryStream wrapping in CloudFetch - Buffer validity preserved after MemoryStream disposal via reference counting - Maintains cancellation token support in async path - No performance impact - purely refactoring and cleanup ## Testing - Existing unit tests pass - No functional changes to decompression logic - Telemetry output remains consistent 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <[email protected]>
Expands support for arrow to include the latest version 57. Also, the minor version of datafusion specified in the lock file has been updated. Supersede apache#3634.
|
Bah, I botched the rebase yet again. Will resubmit. |
For illustration purposes.