Skip to content

Conversation

@eric-wang-1990
Copy link
Contributor

@eric-wang-1990 eric-wang-1990 commented May 1, 2025

Description:
This PR adds a new metadata API GetColumnsExtended to the Apache Hive2 driver. This consolidated metadata query combines column information with primary key and foreign key relationships, allowing clients to retrieve complete column metadata in a single call.

Changes:
New Metadata Command: Added GetColumnsExtended to the list of supported metadata commands
Consolidated Query Implementation: The new method retrieves and combines data from:
GetColumns - Basic column metadata
GetPrimaryKeys - Primary key information
GetCrossReference - Foreign key relationships
Schema Enhancement: Added prefixed fields to the schema:
PK_COLUMN_NAME, PK_KEY_SEQ for primary key information
FK_PKCOLUMN_NAME, FK_PKTABLE_CAT, FK_PKTABLE_SCHEM, FK_PKTABLE_NAME, FK_FKCOLUMN_NAME for foreign key information
Relationship Mapping: Each column is matched with its corresponding PK/FK data (if any)
Unified Result Set: All data is combined into a single Arrow RecordBatch

Benefits:
Reduced API calls: Clients can fetch complete column information with 1 call instead of 3
Simplified client code: No need to manually join metadata from multiple queries
Complete column context: Get column type information along with its relationships
Better performance: Reduces network round-trips for metadata operations

Testing:
Added tests in StatementTests.cs to verify that the extended fields are correctly populated
Tested with tables containing primary and foreign keys to ensure correctness
Sample return for a foreign key column:

 TABLE_CAT: powerbi
  TABLE_SCHEM: default
  TABLE_NAME: nyc_taxi_tripdata
  COLUMN_NAME: DOLocationID
  DATA_TYPE: -5
  TYPE_NAME: BIGINT
  COLUMN_SIZE: 8
  BUFFER_LENGTH: null
  DECIMAL_DIGITS: 0
  NUM_PREC_RADIX: 10
  NULLABLE: 1
  REMARKS: 
  COLUMN_DEF: null
  SQL_DATA_TYPE: null
  SQL_DATETIME_SUB: null
  CHAR_OCTET_LENGTH: null
  ORDINAL_POSITION: 7
  IS_NULLABLE: YES
  SCOPE_CATALOG: null
  SCOPE_SCHEMA: null
  SCOPE_TABLE: null
  SOURCE_DATA_TYPE: null
  IS_AUTO_INCREMENT: NO
  BASE_TYPE_NAME: BIGINT
  PK_COLUMN_NAME: null
  FK_PKCOLUMN_NAME: LocationID
  FK_PKTABLE_CAT: powerbi
  FK_PKTABLE_SCHEM: default
  FK_PKTABLE_NAME: taxi_zone_lookup
  FK_FKCOLUMN_NAME: DOLocationID

TODO:
Based on runtime version switch to use DescribeTableExtended.

@github-actions github-actions bot added this to the ADBC Libraries 18 milestone May 1, 2025
@eric-wang-1990 eric-wang-1990 changed the title Implement GetColumnsExtended for Hive2 driver Implement GetColumnsExtended metadata for Databricks May 1, 2025
@CurtHagenlocher CurtHagenlocher changed the title Implement GetColumnsExtended metadata for Databricks feat(csharp/src/Drivers/Apache): Implement GetColumnsExtended metadata for Databricks May 1, 2025
Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Please pass the allFields parameter directly and not by reference and consider adding the comments about the results being in a single batch.

Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've made a few suggestions for improvements, and the code needs to pass the whitespace linter.

Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@CurtHagenlocher CurtHagenlocher merged commit 22dccca into apache:main May 8, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants