Skip to content

Conversation

@eric-wang-1990
Copy link
Contributor

The directResults field control how many rows/bytes can be returned in one arrow batch.
Before this change, due to a bug for databricks it is calling base class SparkConnection, which has maxRows=1000, which is too small.
ODBC can get all results in a single ExecuteStatement call while ADBC needs 1 ExecuteStatement and multiple FetchResults, which cause ADBC to be slower in small queries.
For ADBC:
image
For ODBC:
image
This PR update the DefaultMaxBytes to 10MB, which is the same limit on Databricks backend for Arrow row set.
MaxRows to be 500K, assuming a minimum 20 Bytes column size.

@eric-wang-1990 eric-wang-1990 marked this pull request as ready for review September 26, 2025 20:27
@github-actions github-actions bot added this to the ADBC Libraries 21 milestone Sep 26, 2025
@eric-wang-1990 eric-wang-1990 changed the title improve direct query perf fix(csharp/src/Drivers/Databricks): Update DirectResult MaxRows MaxBytes setting Sep 26, 2025
Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Can you please fix the linter warnings?

@CurtHagenlocher CurtHagenlocher merged commit f62ac67 into apache:main Sep 26, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants