-
Notifications
You must be signed in to change notification settings - Fork 173
feat(csharp/src/Drivers/Databricks): Optimize GetColumnsExtendedAsync via DESC TABLE EXTENDED #2953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…xtended to implement GetColumnsExtendedAsync
81b326f to
7d83b75
Compare
49b3f6e to
8c73c0b
Compare
| * limitations under the License. | ||
| */ | ||
|
|
||
| using Apache.Arrow.Adbc.Drivers.Databricks.Result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mocify/improve the existing GetColumnExtended tests to verify that:
With the flag on/off, they get same number of columns, same schema, and same data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated it to support test for both flag on/off as below, they are almost same except for the IS_NULLABLE on primary key.
[InlineData("all_column_types", "Resources/create_table_all_types.sql", "Resources/result_get_column_extended_all_types.json", true, new[] { "PK_IS_NULLABLE:NO" })]
[InlineData("all_column_types", "Resources/create_table_all_types.sql", "Resources/result_get_column_extended_all_types.json", false, new[] { "PK_IS_NULLABLE:YES" })]
public async Task CanGetColumnsExtended(string tableName,string createTableSqlLocation, string resultLocation, bool useDescTableExtended, string[] ?extraPlaceholdsInResult = null)4d4af63 to
4a0d05a
Compare
4a0d05a to
bb93562
Compare
CurtHagenlocher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good! I've just got a few nits about formatting and casing of method names.
csharp/src/Drivers/Databricks/Result/DescTableExtendedResult.cs
Outdated
Show resolved
Hide resolved
csharp/src/Drivers/Databricks/Result/DescTableExtendedResult.cs
Outdated
Show resolved
Hide resolved
csharp/src/Drivers/Databricks/Result/DescTableExtendedResult.cs
Outdated
Show resolved
Hide resolved
csharp/src/Drivers/Databricks/Result/DescTableExtendedResult.cs
Outdated
Show resolved
Hide resolved
ba704ef to
aa7a763
Compare
CurtHagenlocher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The changes look good but there's a merge conflict. Could you rebase on top of the latest sourced?
Done. |
eric-wang-1990
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Optimize GetColumnsExtendedAsync in Databricks via
DESC TABLE EXTENDEDMotivation
Currently,
HiveServer2Statement.GetColumnsExtendedAsyncwill make 3 thrift callsGetColumns,GetPrimaryKeysandGetCrossReferenceto get all the information and then join them into one result set. Now Databricks introduces a SQLDESC TABLE EXTENDED <table> AS JSONthat will return all of these information in one query, this can save 2 extra roundtrips and improve the performance.Description
Override
HiveServer2Statement.GetColumnsExtendedAsyncinDatabricksStatementby executing single SQLDESC TABLE EXTENDED <table> AS JSONto get all the required column info forGetColumnsExtendedAsyncthen combine and join these info into the result. As this SQLDESC TABLE EXTENDED <table> AS JSONis only available in Databricks Runtime 16.2 or above, it will check theServerProtocolVersion. if it does not meet the requirement, it will fallback the implementation of base class.Change
DescTableExtendedResultthat represents the result ofDESC TABLE EXTENDED <table> AS JSON(see format here). It alsotable_constraintsto the structured primary key and foreign key constraintsDataType,ColumnSize,FullTypeNamewhich are calculated from the column type and type specific propertiesHiveServer2Statementto protected so that they can be used/overridden in subclassDatabricksStatementPrimaryKeyFieldsForeignKeyFieldsPrimaryKeyPrefixForeignKeyPrefixGetColumnsExtendedAsyncCreateEmptyExtendedColumnsResultDatabricksStatementwith the changes belowGetColumnsExtendedAsyncinDatabricksStatement, it executes queryDESC TABLE EXTENDED <table> AS JSONto get all the required info and then json them into the QueryResultGetColumnsAsyncto a reusable methodCreateColumnMetadataSchemaGetColumnsAsyncto a reusable methodCreateColumnMetadataEmptyArrayadbc.databricks.use_desc_table_extendedCanUseDescTableExtendedinDatabricksConnection,DatabricksStatementwill call it to decide if it will overrideGetColumnsExtendedAsyncStatementTest:CanGetColumnsExtendedTesting
DescTableExtendedResultTestto cover all the deserialization and parsing cases from raw result ofDESC TABLE EXTENDED <table> AS JSONColumnsExtendedrelevant test cases inDatabricks/StatementTest