Skip to content

Conversation

@alexguo-db
Copy link
Contributor

  • Add option to set EnableDirectResults, which sends getDirectResults in the Thrift execute statement request
  • If getDirectResults is set in the request, then directResults is set on the response containing initial results (equivalent to the server calling GetOperationStatus, GetResultSetMetadata, FetchResults, and CloseOperation)
  • If directResults is set on the response, don't poll for the operation status
  • We already set getDirectResults on requests for metadata commands, just not the execute statement request

Tested E2E using dotnet test --filter CloudFetchE2ETest

[xUnit.net 00:00:00.11]   Starting:    Apache.Arrow.Adbc.Tests.Drivers.Databricks
[xUnit.net 00:01:27.27]   Finished:    Apache.Arrow.Adbc.Tests.Drivers.Databricks
  Apache.Arrow.Adbc.Tests.Drivers.Databricks test net8.0 succeeded (87.7s)

Test summary: total: 8, failed: 0, succeeded: 8, skipped: 0, duration: 87.7s
Build succeeded in 89.1s

@alexguo-db alexguo-db changed the title feat(csharp/src/Drivers/Databricks) Add option to enable using direct results for queries feat(csharp/src/Drivers/Databricks) Add option to enable using direct results for statements Apr 23, 2025
@github-actions github-actions bot added this to the ADBC Libraries 18 milestone Apr 23, 2025
@alexguo-db alexguo-db force-pushed the alex-guo_data/alexguo-support-direct-results branch from c076812 to 56cd239 Compare April 23, 2025 00:36
@alexguo-db alexguo-db changed the title feat(csharp/src/Drivers/Databricks) Add option to enable using direct results for statements feat(csharp/src/Drivers/Databricks): Add option to enable using direct results for statements Apr 23, 2025
Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! We need to mollify the nullability checker, and I've added a few suggestions for fairly superficial changes.

{
// The initial response has result data so we don't need to poll
metadata = DirectResults.ResultSetMetadata;
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else {
}
else
{


if (!string.IsNullOrEmpty(DirectResults.OperationStatus?.DisplayMessage))
{
throw new HiveServer2Exception(DirectResults.OperationStatus.DisplayMessage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw new HiveServer2Exception(DirectResults.OperationStatus.DisplayMessage)
throw new HiveServer2Exception(DirectResults.OperationStatus!.DisplayMessage)

I think the null-forgiveness operator is required here because the nullability analysis doesn't know about string.IsNullOrEmpty.

statement.QueryTimeout = QueryTimeoutSeconds;
}

public TSparkDirectResults? DirectResults { get; set; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public TSparkDirectResults? DirectResults { get; set; }
protected TSparkDirectResults? DirectResults { get; set; }

private bool _applySSPWithQueries = false;
private bool _enableDirectResults = true;

internal static TSparkGetDirectResults defaultGetDirectResults = new(){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
internal static TSparkGetDirectResults defaultGetDirectResults = new(){
internal static TSparkGetDirectResults defaultGetDirectResults = new()
{

/// Checks if direct results are available.
/// </summary>
/// <returns>True if direct results are available and contain result data, false otherwise.</returns>
public bool HasDirectResults()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public bool HasDirectResults()
public bool HasDirectResults => DirectResults?.ResultSet != null && DirectResults?.ResultSetMetadata != null;

(This is a good candidate to be a property instead of a method.)

this.statement = statement;
this.schema = schema;
this.isLz4Compressed = isLz4Compressed;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, maybe start a new direct result reader. make sure doing one thing in one class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not, after a 2nd look at the code

@alexguo-db
Copy link
Contributor Author

@CurtHagenlocher
I believe the feedback is addressed now, but please wait for #2678 before merging

@CurtHagenlocher
Copy link
Contributor

@CurtHagenlocher I believe the feedback is addressed now, but please wait for #2678 before merging

Now that #2678 has been checked in there's a merge conflict that will need to be resolved.

if (_statement.HasDirectResults && _statement.DirectResults?.ResultSet?.Results?.ResultLinks?.Count > 0)
{
// Yield execution so the download queue doesn't get blocked before downloader is started
await Task.Yield();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this line, there's a deadlock because the _downloadQueue is full, and there's no queue consumer since the downloader hasn't started yet

return baseHandler;
}

protected internal override bool AreResultsAvailableDirectly() => _enableDirectResults;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good place for a property:

Suggested change
protected internal override bool AreResultsAvailableDirectly() => _enableDirectResults;
protected internal override bool AreResultsAvailableDirectly => _enableDirectResults;

Sorry; I missed this in my previous review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overriding AreResultsAvailableDirectly in HiveServer2Connection/SparkConnection, where it is a function. Should I update the field in both parent classes to be a property as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please!

Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again! Sorry; I missed another method that should be a property in my previous review. Looks like we're good after that change.

get { return _directResults; }
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra blank line

Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@CurtHagenlocher CurtHagenlocher merged commit 1038658 into apache:main Apr 28, 2025
6 checks passed
colin-rogers-dbt pushed a commit to dbt-labs/arrow-adbc that referenced this pull request Jun 10, 2025
…t results for statements (apache#2737)

- Add option to set EnableDirectResults, which sends getDirectResults in
the Thrift execute statement request
- If getDirectResults is set in the request, then directResults is set
on the response containing initial results (equivalent to the server
calling GetOperationStatus, GetResultSetMetadata, FetchResults, and
CloseOperation)
- If directResults is set on the response, don't poll for the operation
status
- We already set getDirectResults on requests for metadata commands,
just not the execute statement request

Tested E2E using `dotnet test --filter CloudFetchE2ETest`

```
[xUnit.net 00:00:00.11]   Starting:    Apache.Arrow.Adbc.Tests.Drivers.Databricks
[xUnit.net 00:01:27.27]   Finished:    Apache.Arrow.Adbc.Tests.Drivers.Databricks
  Apache.Arrow.Adbc.Tests.Drivers.Databricks test net8.0 succeeded (87.7s)

Test summary: total: 8, failed: 0, succeeded: 8, skipped: 0, duration: 87.7s
Build succeeded in 89.1s
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants