-
Notifications
You must be signed in to change notification settings - Fork 173
feat(csharp/src/Drivers/Apache): Implement GetColumnsExtended metadata for Databricks #2766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(csharp/src/Drivers/Apache): Implement GetColumnsExtended metadata for Databricks #2766
Conversation
CurtHagenlocher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Please pass the allFields parameter directly and not by reference and consider adding the comments about the results being in a single batch.
…sync method properly return Task
CurtHagenlocher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've made a few suggestions for improvements, and the code needs to pass the whitespace linter.
CurtHagenlocher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Description:
This PR adds a new metadata API GetColumnsExtended to the Apache Hive2 driver. This consolidated metadata query combines column information with primary key and foreign key relationships, allowing clients to retrieve complete column metadata in a single call.
Changes:
New Metadata Command: Added GetColumnsExtended to the list of supported metadata commands
Consolidated Query Implementation: The new method retrieves and combines data from:
GetColumns - Basic column metadata
GetPrimaryKeys - Primary key information
GetCrossReference - Foreign key relationships
Schema Enhancement: Added prefixed fields to the schema:
PK_COLUMN_NAME, PK_KEY_SEQ for primary key information
FK_PKCOLUMN_NAME, FK_PKTABLE_CAT, FK_PKTABLE_SCHEM, FK_PKTABLE_NAME, FK_FKCOLUMN_NAME for foreign key information
Relationship Mapping: Each column is matched with its corresponding PK/FK data (if any)
Unified Result Set: All data is combined into a single Arrow RecordBatch
Benefits:
Reduced API calls: Clients can fetch complete column information with 1 call instead of 3
Simplified client code: No need to manually join metadata from multiple queries
Complete column context: Get column type information along with its relationships
Better performance: Reduces network round-trips for metadata operations
Testing:
Added tests in StatementTests.cs to verify that the extended fields are correctly populated
Tested with tables containing primary and foreign keys to ensure correctness
Sample return for a foreign key column:
TODO:
Based on runtime version switch to use DescribeTableExtended.