Skip to content

Conversation

@jadewang-db
Copy link
Contributor

@jadewang-db jadewang-db commented Apr 7, 2025

Add Prefetch Functionality to CloudFetch in Spark ADBC Driver

This PR enhances the CloudFetch feature in the Spark ADBC driver by implementing prefetch functionality, which improves performance by fetching multiple batches of results ahead of time.

Changes

CloudFetchResultFetcher Enhancements

  • Initial Prefetch: Added code to perform an initial prefetch of multiple batches when the fetcher starts, ensuring data is available immediately when needed.
  • State Management: Added tracking for current batch offset and size, with proper state reset when starting the fetcher.

Interface Updates

  • Added new methods to ICloudFetchResultFetcher interface:

Testing Infrastructure

  • Created ITestableHiveServer2Statement interface to facilitate testing
  • Updated tests to account for prefetch behavior
  • Ensured all tests pass with the new prefetch functionality

Benefits

  • Improved Performance: By prefetching multiple batches, data is available sooner, reducing wait times.
  • Better Reliability: Enhanced error handling and state management make the system more robust.
  • More Efficient Resource Usage: Link caching reduces unnecessary server requests.

This implementation maintains backward compatibility while providing significant performance improvements for CloudFetch operations.

@github-actions github-actions bot added this to the ADBC Libraries 18 milestone Apr 7, 2025
@jadewang-db jadewang-db force-pushed the cloudfetch-pipeline branch 2 times, most recently from 01daf70 to a388213 Compare April 14, 2025 19:49
Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'm still reviewing this logic but thought I'd give some initial feedback.

Also, please take a look at the linter output and make changes accordingly.

Update DatabricksParameters.cs

address comments

fix linter

rebase to master

refactor to fix unit test

refactor

some code refactoring

refactor

Delete CloudFetchDownloadManagerTest.cs

Initital changes
Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks great!

@CurtHagenlocher CurtHagenlocher merged commit 7f3d33b into apache:main Apr 24, 2025
6 checks passed
colin-rogers-dbt pushed a commit to dbt-labs/arrow-adbc that referenced this pull request Jun 10, 2025
…etch in Spark ADBC driver (apache#2678)

# Add Prefetch Functionality to CloudFetch in Spark ADBC Driver

This PR enhances the CloudFetch feature in the Spark ADBC driver by
implementing prefetch functionality, which improves performance by
fetching multiple batches of results ahead of time.

## Changes

### CloudFetchResultFetcher Enhancements

- **Initial Prefetch**: Added code to perform an initial prefetch of
multiple batches when the fetcher starts, ensuring data is available
immediately when needed.
- **State Management**: Added tracking for current batch offset and
size, with proper state reset when starting the fetcher.


### Interface Updates

- Added new methods to `ICloudFetchResultFetcher` interface:


### Testing Infrastructure

- Created `ITestableHiveServer2Statement` interface to facilitate
testing
- Updated tests to account for prefetch behavior
- Ensured all tests pass with the new prefetch functionality

## Benefits

- **Improved Performance**: By prefetching multiple batches, data is
available sooner, reducing wait times.
- **Better Reliability**: Enhanced error handling and state management
make the system more robust.
- **More Efficient Resource Usage**: Link caching reduces unnecessary
server requests.

This implementation maintains backward compatibility while providing
significant performance improvements for CloudFetch operations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants