Skip to content

Conversation

@eric-wang-1990
Copy link
Contributor

@eric-wang-1990 eric-wang-1990 commented Nov 25, 2025

Fixes memory starvation issue in CloudFetch download pipeline where newer downloads could win memory allocation over older waiting downloads, causing indefinite delays.

Problem

The CloudFetch downloader was acquiring memory inside parallel download tasks with a polling-based wait (10ms intervals). This non-FIFO behavior caused memory starvation:

  1. Multiple download tasks acquire semaphore slots and start polling for memory
  2. All tasks wake up simultaneously every 10ms to check memory availability
  3. When memory becomes available, any waiting task could acquire it (non-deterministic)
  4. Newer downloads (e.g., file 5, 7, 8) could repeatedly win the race over older downloads (e.g., file 4)
  5. File 4 never gets memory despite waiting the longest → indefinite starvation

Example scenario with 200MB memory, 50MB files, 3 parallel downloads:

  • Files 1, 2, 3 download (150MB used)
  • File 1 completes and releases 50MB
  • Files 4, 5, 6 are all waiting for memory
  • File 5 wins the race and acquires the 50MB
  • File 2 completes, file 7 wins the race
  • File 3 completes, file 8 wins the race
  • File 4 never gets memory because files 5, 7, 8 keep winning

Solution

Move memory acquisition from inside parallel download tasks to the main sequential loop. This ensures FIFO ordering:

  1. Main loop acquires memory sequentially for each download
  2. Only after memory is acquired does the download task start
  3. Downloads are guaranteed to get memory in the order they were queued
  4. No starvation possible

Additionally, increased default memory buffer from 100MB to 200MB to allow more parallel downloads without hitting memory limits.

Changes

  • CloudFetchDownloader.cs:
    • Moved AcquireMemoryAsync from inside DownloadFileAsync to main loop (line 278-280)
    • Ensures FIFO ordering before spawning download tasks
  • CloudFetchDownloadManager.cs:
    • Increased DefaultMemoryBufferSizeMB from 100 to 200
    • Better performance for large result sets

Testing

  • Manually verified on fast VM with Power BI Desktop
  • Existing CloudFetchDownloader tests still pass
  • FIFO ordering prevents the starvation scenario

TODO

The memory buffer design have a flaw that it only captures the compressed size, but does not cover decompressed size. There are a couple of options we can take but I wanna land this first since this one actually can get blocked for customer.

@github-actions github-actions bot added this to the ADBC Libraries 22 milestone Nov 25, 2025
@eric-wang-1990 eric-wang-1990 force-pushed the fix-cloudfetch-memory-deadlock branch from 311b880 to f0b53bb Compare November 25, 2025 10:49
@CurtHagenlocher CurtHagenlocher changed the title fix(csharp/databricks): Release CloudFetch memory immediately after download completes fix(csharp/src/Drivers/Databricks): Release CloudFetch memory immediately after download completes Nov 25, 2025
@CurtHagenlocher
Copy link
Contributor

CurtHagenlocher commented Nov 25, 2025

If the memory counter is released as soon as the chunk is downloaded, then what's to prevent all of memory being consumed by downloaded chunks if there's a slow reader that's still working its way through chunks at the beginning? This seems to be acting more of a download-parallelism limiter now than a memory-usage limiter.

@eric-wang-1990 eric-wang-1990 changed the title fix(csharp/src/Drivers/Databricks): Release CloudFetch memory immediately after download completes [On hold]fix(csharp/src/Drivers/Databricks): Release CloudFetch memory immediately after download completes Nov 25, 2025
@eric-wang-1990 eric-wang-1990 changed the title [On hold]fix(csharp/src/Drivers/Databricks): Release CloudFetch memory immediately after download completes fix(csharp/src/Drivers/Databricks): Implement FIFO memory acquisition to prevent starvation in CloudFetch Nov 26, 2025
Fixes memory starvation issue where newer downloads could win memory
allocation over older waiting downloads, causing indefinite delays.

Changes:
1. Move memory acquisition from DownloadFileAsync to main loop for FIFO ordering
   - Prevents memory starvation where newer downloads win over older waiting downloads
   - Memory is now acquired sequentially before spawning download tasks

2. Increase default memory buffer from 100MB to 200MB
   - Allows more parallel downloads without hitting memory limits
   - Better performance for large result sets

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@eric-wang-1990 eric-wang-1990 force-pushed the fix-cloudfetch-memory-deadlock branch from 8afca15 to fd1db09 Compare November 26, 2025 01:56
Copy link
Contributor

@jadewang-db jadewang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add some test ?

- Add mock setup for ActivityTrace to fix NullReferenceException
- Test with maxParallelDownloads=3 to simulate realistic contention
- Track actual download start order via HTTP handler mock
- Verify FIFO ordering at three points: memory acquisition, download start, and resultQueue
- Inspect resultQueue directly instead of consuming via GetNextDownloadedFileAsync

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Assuming I understand this change correctly, this seems a far better approach!

@CurtHagenlocher CurtHagenlocher merged commit 87c49e9 into apache:main Dec 2, 2025
6 checks passed
eric-wang-1990 added a commit to adbc-drivers/databricks that referenced this pull request Dec 10, 2025
…in CloudFetch (#56)

This is cherry pick of this PR:
apache/arrow-adbc#3756

Fixes memory starvation issue in CloudFetch download pipeline where
newer downloads could win memory allocation over older waiting
downloads, causing indefinite delays.

## Problem
The CloudFetch downloader was acquiring memory inside parallel download
tasks with a polling-based wait (10ms intervals). This non-FIFO behavior
caused memory starvation:

1. Multiple download tasks acquire semaphore slots and start polling for
memory
2. All tasks wake up simultaneously every 10ms to check memory
availability
3. When memory becomes available, any waiting task could acquire it
(non-deterministic)
4. Newer downloads (e.g., file 5, 7, 8) could repeatedly win the race
over older downloads (e.g., file 4)
5. File 4 never gets memory despite waiting the longest → **indefinite
starvation**

Example scenario with 200MB memory, 50MB files, 3 parallel downloads:
- Files 1, 2, 3 download (150MB used)
- File 1 completes and releases 50MB
- Files 4, 5, 6 are all waiting for memory
- File 5 wins the race and acquires the 50MB
- File 2 completes, file 7 wins the race
- File 3 completes, file 8 wins the race
- File 4 never gets memory because files 5, 7, 8 keep winning

## Solution
Move memory acquisition from inside parallel download tasks to the main
sequential loop. This ensures FIFO ordering:

1. Main loop acquires memory sequentially for each download
2. Only after memory is acquired does the download task start
3. Downloads are guaranteed to get memory in the order they were queued
4. No starvation possible

Additionally, increased default memory buffer from 100MB to 200MB to
allow more parallel downloads without hitting memory limits.

## Changes
- **CloudFetchDownloader.cs**: 
- Moved `AcquireMemoryAsync` from inside `DownloadFileAsync` to main
loop (line 278-280)
  - Ensures FIFO ordering before spawning download tasks
- **CloudFetchDownloadManager.cs**: 
  - Increased `DefaultMemoryBufferSizeMB` from 100 to 200
  - Better performance for large result sets

## Testing
- Manually verified on fast VM with Power BI Desktop
- Existing CloudFetchDownloader tests still pass
- FIFO ordering prevents the starvation scenario

## TODO
The memory buffer design have a flaw that it only captures the
compressed size, but does not cover decompressed size. There are a
couple of options we can take but I wanna land this first since this one
actually can get blocked for customer.

---------

Co-authored-by: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants