-
Notifications
You must be signed in to change notification settings - Fork 173
fix(csharp/src/Drivers/Databricks): Implement FIFO memory acquisition to prevent starvation in CloudFetch #3756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(csharp/src/Drivers/Databricks): Implement FIFO memory acquisition to prevent starvation in CloudFetch #3756
Conversation
311b880 to
f0b53bb
Compare
|
If the memory counter is released as soon as the chunk is downloaded, then what's to prevent all of memory being consumed by downloaded chunks if there's a slow reader that's still working its way through chunks at the beginning? This seems to be acting more of a download-parallelism limiter now than a memory-usage limiter. |
Fixes memory starvation issue where newer downloads could win memory allocation over older waiting downloads, causing indefinite delays. Changes: 1. Move memory acquisition from DownloadFileAsync to main loop for FIFO ordering - Prevents memory starvation where newer downloads win over older waiting downloads - Memory is now acquired sequentially before spawning download tasks 2. Increase default memory buffer from 100MB to 200MB - Allows more parallel downloads without hitting memory limits - Better performance for large result sets 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
8afca15 to
fd1db09
Compare
jadewang-db
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add some test ?
- Add mock setup for ActivityTrace to fix NullReferenceException - Test with maxParallelDownloads=3 to simulate realistic contention - Track actual download start order via HTTP handler mock - Verify FIFO ordering at three points: memory acquisition, download start, and resultQueue - Inspect resultQueue directly instead of consuming via GetNextDownloadedFileAsync 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
CurtHagenlocher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Assuming I understand this change correctly, this seems a far better approach!
…in CloudFetch (#56) This is cherry pick of this PR: apache/arrow-adbc#3756 Fixes memory starvation issue in CloudFetch download pipeline where newer downloads could win memory allocation over older waiting downloads, causing indefinite delays. ## Problem The CloudFetch downloader was acquiring memory inside parallel download tasks with a polling-based wait (10ms intervals). This non-FIFO behavior caused memory starvation: 1. Multiple download tasks acquire semaphore slots and start polling for memory 2. All tasks wake up simultaneously every 10ms to check memory availability 3. When memory becomes available, any waiting task could acquire it (non-deterministic) 4. Newer downloads (e.g., file 5, 7, 8) could repeatedly win the race over older downloads (e.g., file 4) 5. File 4 never gets memory despite waiting the longest → **indefinite starvation** Example scenario with 200MB memory, 50MB files, 3 parallel downloads: - Files 1, 2, 3 download (150MB used) - File 1 completes and releases 50MB - Files 4, 5, 6 are all waiting for memory - File 5 wins the race and acquires the 50MB - File 2 completes, file 7 wins the race - File 3 completes, file 8 wins the race - File 4 never gets memory because files 5, 7, 8 keep winning ## Solution Move memory acquisition from inside parallel download tasks to the main sequential loop. This ensures FIFO ordering: 1. Main loop acquires memory sequentially for each download 2. Only after memory is acquired does the download task start 3. Downloads are guaranteed to get memory in the order they were queued 4. No starvation possible Additionally, increased default memory buffer from 100MB to 200MB to allow more parallel downloads without hitting memory limits. ## Changes - **CloudFetchDownloader.cs**: - Moved `AcquireMemoryAsync` from inside `DownloadFileAsync` to main loop (line 278-280) - Ensures FIFO ordering before spawning download tasks - **CloudFetchDownloadManager.cs**: - Increased `DefaultMemoryBufferSizeMB` from 100 to 200 - Better performance for large result sets ## Testing - Manually verified on fast VM with Power BI Desktop - Existing CloudFetchDownloader tests still pass - FIFO ordering prevents the starvation scenario ## TODO The memory buffer design have a flaw that it only captures the compressed size, but does not cover decompressed size. There are a couple of options we can take but I wanna land this first since this one actually can get blocked for customer. --------- Co-authored-by: Claude <[email protected]>
Fixes memory starvation issue in CloudFetch download pipeline where newer downloads could win memory allocation over older waiting downloads, causing indefinite delays.
Problem
The CloudFetch downloader was acquiring memory inside parallel download tasks with a polling-based wait (10ms intervals). This non-FIFO behavior caused memory starvation:
Example scenario with 200MB memory, 50MB files, 3 parallel downloads:
Solution
Move memory acquisition from inside parallel download tasks to the main sequential loop. This ensures FIFO ordering:
Additionally, increased default memory buffer from 100MB to 200MB to allow more parallel downloads without hitting memory limits.
Changes
AcquireMemoryAsyncfrom insideDownloadFileAsyncto main loop (line 278-280)DefaultMemoryBufferSizeMBfrom 100 to 200Testing
TODO
The memory buffer design have a flaw that it only captures the compressed size, but does not cover decompressed size. There are a couple of options we can take but I wanna land this first since this one actually can get blocked for customer.