Skip to content

Conversation

@eric-wang-1990
Copy link
Collaborator

This is cherry pick of this PR: apache/arrow-adbc#3756

Fixes memory starvation issue in CloudFetch download pipeline where newer downloads could win memory allocation over older waiting downloads, causing indefinite delays.

Problem

The CloudFetch downloader was acquiring memory inside parallel download tasks with a polling-based wait (10ms intervals). This non-FIFO behavior caused memory starvation:

  1. Multiple download tasks acquire semaphore slots and start polling for memory
  2. All tasks wake up simultaneously every 10ms to check memory availability
  3. When memory becomes available, any waiting task could acquire it (non-deterministic)
  4. Newer downloads (e.g., file 5, 7, 8) could repeatedly win the race over older downloads (e.g., file 4)
  5. File 4 never gets memory despite waiting the longest → indefinite starvation

Example scenario with 200MB memory, 50MB files, 3 parallel downloads:

  • Files 1, 2, 3 download (150MB used)
  • File 1 completes and releases 50MB
  • Files 4, 5, 6 are all waiting for memory
  • File 5 wins the race and acquires the 50MB
  • File 2 completes, file 7 wins the race
  • File 3 completes, file 8 wins the race
  • File 4 never gets memory because files 5, 7, 8 keep winning

Solution

Move memory acquisition from inside parallel download tasks to the main sequential loop. This ensures FIFO ordering:

  1. Main loop acquires memory sequentially for each download
  2. Only after memory is acquired does the download task start
  3. Downloads are guaranteed to get memory in the order they were queued
  4. No starvation possible

Additionally, increased default memory buffer from 100MB to 200MB to allow more parallel downloads without hitting memory limits.

Changes

  • CloudFetchDownloader.cs:
    • Moved AcquireMemoryAsync from inside DownloadFileAsync to main loop (line 278-280)
    • Ensures FIFO ordering before spawning download tasks
  • CloudFetchDownloadManager.cs:
    • Increased DefaultMemoryBufferSizeMB from 100 to 200
    • Better performance for large result sets

Testing

  • Manually verified on fast VM with Power BI Desktop
  • Existing CloudFetchDownloader tests still pass
  • FIFO ordering prevents the starvation scenario

TODO

The memory buffer design have a flaw that it only captures the compressed size, but does not cover decompressed size. There are a couple of options we can take but I wanna land this first since this one actually can get blocked for customer.

…acquisition

This PR fixes a memory deadlock issue in CloudFetch by ensuring FIFO
(First-In-First-Out) ordering of memory acquisition.

## Problem
Previously, memory was acquired inside DownloadFileAsync() after the download
task was spawned. This caused downloads to compete for memory in an unpredictable
order, potentially leading to deadlocks when memory is limited.

## Solution
1. Move memory acquisition to before spawning the download task
   - Memory is now acquired in the sequential main loop (line 283-285)
   - This ensures FIFO ordering based on when files arrive in the download queue
   - Removed memory acquisition from inside DownloadFileAsync() (previously line 395)

2. Increase default memory buffer from 100MB to 200MB
   - Reduces likelihood of memory contention
   - Better accommodates parallel downloads

3. Add comprehensive FIFO test
   - Verifies memory acquisition order
   - Verifies download start order (proves file 3 doesn't start before file 2)
   - Verifies resultQueue ordering

## Test Results
- New FIFO test passes successfully
- Validates proper ordering at three verification points

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@eric-wang-1990 eric-wang-1990 changed the title fix(csharp/src/Drivers/Databricks): Implement FIFO memory acquisition to prevent starvation in CloudFetch fix(csharp): Implement FIFO memory acquisition to prevent starvation in CloudFetch Nov 26, 2025
@eric-wang-1990 eric-wang-1990 changed the title fix(csharp): Implement FIFO memory acquisition to prevent starvation in CloudFetch fix(csharp): implement FIFO memory acquisition to prevent starvation in CloudFetch Dec 1, 2025
Resolved conflict in CloudFetchDownloadManager.cs by accepting main branch refactoring that removed protocol-specific fields.
…Result interface

Updated mock setup to use FileUrl, ByteCount, and ExpirationTime properties
instead of the removed Link property after interface refactoring.
@eric-wang-1990 eric-wang-1990 added the benchmark Run performance benchmarks on this PR label Dec 8, 2025
@github-actions
Copy link

github-actions bot commented Dec 8, 2025

📊 CloudFetch Benchmark Results

.NET 8.0
Query Mean (ms) Baseline Diff Rows Columns
catalog_sales 3768.75 - - 1441548 34
customer 1227.52 - - 100000 18
inventory 7583.74 - - 11745000 5
sales(...)tamps [21] 4842.10 - -
store_sales_numeric 4794.30 - - 2880404 16
web_sales 2370.12 - - 719384 34
wide_sales_analysis 15430.69 - - 2880404 54
.NET Framework 4.7.2
Query Mean (ms) Baseline Diff Rows Columns
catalog_sales 4314.55 - - 1441548 34
customer 644.95 - - 100000 18
inventory 45164.12 - - 11745000 5
sales(...)tamps [21] 16340.93 - -
store_sales_numeric 12551.49 - - 2880404 16
web_sales 3206.65 - - 719384 34
wide_sales_analysis 16555.17 - - 2880404 54

🟢 Improvement | 🔴 Regression | ⚪ No change | Baseline from latest main branch run

@eric-wang-1990 eric-wang-1990 removed the benchmark Run performance benchmarks on this PR label Dec 8, 2025
@eric-wang-1990 eric-wang-1990 added the benchmark Run performance benchmarks on this PR label Dec 8, 2025
@github-actions
Copy link

github-actions bot commented Dec 8, 2025

📊 CloudFetch Benchmark Results

.NET 8.0
Query Mean (ms) Baseline Diff Rows Columns
.NET Framework 4.7.2
Query Mean (ms) Baseline Diff Rows Columns

🟢 Improvement | 🔴 Regression | ⚪ No change | Baseline from latest main branch run

@eric-wang-1990 eric-wang-1990 added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Dec 9, 2025
@eric-wang-1990 eric-wang-1990 added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Dec 9, 2025
@github-actions
Copy link

github-actions bot commented Dec 9, 2025

📊 CloudFetch Benchmark Results

.NET 8.0
Query Mean (ms) Baseline Diff Rows Columns
.NET Framework 4.7.2
Query Mean (ms) Baseline Diff Rows Columns

🟢 Improvement | 🔴 Regression | ⚪ No change | Baseline from latest main branch run

@github-actions
Copy link

github-actions bot commented Dec 9, 2025

📊 CloudFetch Benchmark Results

.NET 8.0
Query Mean (ms) Baseline Diff Rows Columns
.NET Framework 4.7.2
Query Mean (ms) Baseline Diff Rows Columns

🟢 Improvement | 🔴 Regression | ⚪ No change | Baseline from latest main branch run

@eric-wang-1990 eric-wang-1990 added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Dec 9, 2025
@eric-wang-1990 eric-wang-1990 added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Dec 10, 2025
@eric-wang-1990 eric-wang-1990 added benchmark Run performance benchmarks on this PR and removed benchmark Run performance benchmarks on this PR labels Dec 10, 2025
@eric-wang-1990 eric-wang-1990 removed the benchmark Run performance benchmarks on this PR label Dec 10, 2025
@github-actions
Copy link

📊 CloudFetch Benchmark Results

.NET 8.0
Query Mean (ms) Baseline Diff Rows Columns
catalog_sales 4288.11 2658.09 🔴 +60.0% 1441548 34
customer 1543.11 677.81 🔴 +120.0% 100000 18
inventory 7366.79 6722.08 ⚪ ~0% 11745000 5
sales_with_timestamps 6273.87 9658.16 🟢 -30.0% 2880404 13
store_sales_numeric 5023.56 4500.87 🔴 +10.0% 2880404 16
web_sales 2751.84 1507.29 🔴 +80.0% 719384 34
wide_sales_analysis 15776.66 12598.68 🔴 +20.0% 2880404 54
.NET Framework 4.7.2
Query Mean (ms) Baseline Diff Rows Columns
catalog_sales 5687.05 6128.61 ⚪ ~0% 1441548 34
customer 1365.08 1565.94 🟢 -10.0% 100000 18
inventory 45554.86 45958.73 ⚪ ~0% 11745000 5
sales_with_timestamps 14528.96 15106.52 ⚪ ~0% 2880404 13
store_sales_numeric 13371.33 13574.09 ⚪ ~0% 2880404 16
web_sales 3927.35 4012.90 ⚪ ~0% 719384 34
wide_sales_analysis 18225.52 17864.24 ⚪ ~0% 2880404 54

🟢 Improvement | 🔴 Regression | ⚪ No change | Baseline from latest main branch run

@eric-wang-1990 eric-wang-1990 merged commit 77c5d1d into main Dec 10, 2025
13 checks passed
@eric-wang-1990 eric-wang-1990 deleted the eric/fix-cloudfetch-memory-fifo branch December 10, 2025 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark Run performance benchmarks on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants