Skip to content

Conversation

@eric-wang-1990
Copy link
Contributor

Arrow ADBC: Primary Key and Foreign Key Metadata Optimization

Description

This PR adds support for optimizing Primary Key and Foreign Key metadata queries in the C# Databricks ADBC driver. It introduces a new connection parameter adbc.databricks.enable_pk_fk that allows users to control whether the driver should make PK/FK metadata calls to the server or return empty results for improved performance.

Background

Primary Key and Foreign Key metadata queries can be expensive operations, particularly in Databricks environments where they may not be fully supported in certain catalogs. This implementation provides a way to optimize these operations by:

  1. Allowing users to disable PK/FK metadata calls entirely via configuration
  2. Automatically returning empty results for legacy catalogs (SPARK, hive_metastore) where PK/FK metadata is not supported
  3. Ensuring that empty results maintain schema compatibility with real metadata responses

Proposed Changes

  • Add new connection parameter adbc.databricks.enable_pk_fk to control PK/FK metadata behavior (default: true)
  • Implement special handling for legacy catalogs (SPARK, hive_metastore) to return empty results without server calls
  • Modify method visibility in base classes to allow proper overriding in derived classes
  • Add comprehensive test coverage for the new functionality

How is this tested?

Added unit tests that verify:

  1. The correct behavior of the ShouldReturnEmptyPkFkResult method with various combinations of settings
  2. Schema compatibility between empty results and real metadata responses
  3. Proper handling of different catalog scenarios

These tests ensure that the optimization works correctly while maintaining compatibility with client applications that expect consistent schema structures.

@github-actions github-actions bot added this to the ADBC Libraries 19 milestone May 28, 2025
@eric-wang-1990 eric-wang-1990 changed the title feat(csharp/Drivers/Databricks): Primary Key and Foreign Key Metadata Optimization feat(csharp/Drivers/Databricks):Primary Key and Foreign Key Metadata Optimization May 28, 2025
@CurtHagenlocher CurtHagenlocher changed the title feat(csharp/Drivers/Databricks):Primary Key and Foreign Key Metadata Optimization feat(csharp/src/Drivers/Databricks): Primary Key and Foreign Key Metadata Optimization May 28, 2025
Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Please resolve the white space issue identified by the linker and see additional comment.

new Field("TABLE_SCHEM", StringType.Default, true),
new Field("TABLE_NAME", StringType.Default, true),
new Field("COLUMN_NAME", StringType.Default, true),
new Field("KEQ_SEQ", Int32Type.Default, true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be "KEY_SEQ"? If so, it also needs to be changed on line 448.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope this should be KEQ

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the primary key column is named KEQ_SEQ and the foreign key column is named KEY_SEQ? Weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes...that is weird but that is what returned from thrift

@CurtHagenlocher CurtHagenlocher merged commit aae84d2 into apache:main May 28, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants