Skip to content

Conversation

@toddmeng-db
Copy link
Contributor

@toddmeng-db toddmeng-db commented May 30, 2025

This PR adds support for edge cases that the ODBC driver supported. It also allows users to specify a default schema without a default catalog (relying on the default catalog set by backend server)

Default schema fallback

If spark protocol version is too low, initial namespace will not be respected. This PR sets USE <schema> as a backup immediately after TOpenSessionResp. SET CATALOG <catalog> is not implemented since it is introduced at the same time as InitialNamespace.

Additionally, schema can now be provided in the OpenSessionReq without Catalog.

This is relevant for setting default schemas for dbr < 10.4. It means that we can now provide a default schema for pre-unity-catalog

Testing

To test this add to DATABRICKS_TEST_CONFIG_FILE:

// no catalog
"db_schema": "default_schema",

And use a cluster running dbr < 10.4, run OlderDBRVersion_ShouldSetSchemaViaUseStatement

Default catalog legacy compatibility

If default catalog or default schema is provided in the TOpenSessionResp, we want subsequent getTables calls to use the default namespace. This enables powerbi to act as if it is pre-UC, since it will automatically make all getTables requests with default catalog, which is often hive_metastore.

If no schema is provided in OpenSessionReq, only default catalog will be returned. This means whether or not metadata queries have a default schema provided is dependent on if "db_schema" is provided

So, in powerbi, when only a default schema is provided but not a default catalog, it will automatically restrict itself to the default catalog.

Testing

Added MetadataQuery_ShouldUseResponseNamespace

@toddmeng-db toddmeng-db changed the title Default catalogs edge cases feat(csharp/src/Drivers/Databricks): Default catalogs edge cases May 30, 2025

public DatabricksStatement(DatabricksConnection connection)
: base(connection)
{
Copy link
Contributor Author

@toddmeng-db toddmeng-db May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-wang-1990 I think it makes more sense to do the schema filtering here, instead of in powerbi*. Do you think this makes sense?
edit: nevermind, there is no default schema provided in OpenSession response namespace if not specified by client

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I think this PR prevents us from making blanket metadata queries with no catalog and schema (like GetTables). Is that acceptable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this PR prevent GetTables metadata query?
This would just try to replace null with initial namespace value but should not affect any existing metadata queries

Copy link
Contributor Author

@toddmeng-db toddmeng-db Jun 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant, is that with this PR, it is impossible to make GetTables call with no catalog attached to the request, since OpenSessionResp seems to typically have a default catalog in the response (even if not user provided). Seems like this behavior happens in odbc too.

So if I wanted to fetch all tables in the metastore, I couldn't use GetTables after this PR. And the way this PR is written, same for schemas, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline: user can use % in metadata call to override and list all

@toddmeng-db toddmeng-db force-pushed the toddmeng-db/namespace-fallback branch 5 times, most recently from 269da71 to c19c65e Compare May 30, 2025 22:21

private async Task SetSchema(string schemaName) {
using var statement = new DatabricksStatement(this);
statement.SqlQuery = $"USE {schemaName}";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be use database xxx?

Copy link
Contributor Author

@toddmeng-db toddmeng-db Jun 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

odbc does use <xxx>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the identifier here be escaped?

It's my vague understanding that the behavior of USE depends on whether or not there is a current catalog. Will there always be one when we get here or is my understanding wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toddmeng-db Let's make another ticket to handle the escape, all the metadata call does not escape now but should be doing that.
@CurtHagenlocher Regarding whether there is a current catalog, this will only be called on legacy DBR where there is no catalog concept, in such scenarios we should be able to just call use schema.

Copy link
Contributor

@CurtHagenlocher CurtHagenlocher Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is now being quoted. Is it possible to use a backtick in the identifier and if so does it need to be doubled when quoted? e.g. should a literal name of ab`cd be represented as `ab``cd` or is such a name not possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do what @eric-wang-1990 mentioned and cut a ticket to handle escape

}
if (SchemaName == null)
{
SchemaName = defaultNamespace.SchemaName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this expected? Does simba fill schema with default schema as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to check. The server will not return a default schema by default, though. So this only affects queries were the user has set the schema. Which seems reasonable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested - odbc does not do this. Should I remove?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you confirm this? In the metadata call the schema name can be either null or %, if it's % then the driver will not do anything, it will only replace if it's null. What metadata query did you see for simba?

Copy link
Contributor Author

@toddmeng-db toddmeng-db Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the call directly to odbc (not via powerbi). If both catalog and schema are provided, schema is still % by default

Listing tables 'catalog : peco, schemaPattern : %, tableTypes : null, tableName : %'

{
await base.HandleOpenSessionResponse(session);
if (session != null)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are already here, can you also override _enableMultipleCatalogSupport with correct value from the response?

@toddmeng-db toddmeng-db force-pushed the toddmeng-db/namespace-fallback branch from 0548dfd to 652e82b Compare June 2, 2025 16:57
@toddmeng-db toddmeng-db marked this pull request as ready for review June 2, 2025 17:23
@github-actions github-actions bot added this to the ADBC Libraries 19 milestone Jun 2, 2025
{
_defaultNamespace = new TNamespace
{
CatalogName = defaultCatalog,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we only set ifcatalog/schema is not null?

Copy link
Contributor Author

@toddmeng-db toddmeng-db Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that odbc allows for setting catalog to whitespace; but I also think this will make the query broken, right? since there are no whitespace catalogs/schemas?:
Listing tables 'catalog : , schemaPattern : %, tableTypes : null, tableName : %'

{
_defaultNamespace = session.InitialNamespace;
}
else if (_defaultNamespace != null && !string.IsNullOrEmpty(_defaultNamespace.SchemaName))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add some more comments around this else clause, basically this only happens for when there is no initialNamespace response, which means it's an old DBR + Also user pass in a default schema name.
We are not doing it for catalog since multipleCatalog is needed for specifying catalogs, and that is the same protocol version with which we support for initialNamespace, which means if the response does not contain initialNamespace means we do not support specifying catalog at all.

@toddmeng-db toddmeng-db force-pushed the toddmeng-db/namespace-fallback branch from 652e82b to 8a05e12 Compare June 2, 2025 23:11
Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A few minor questions/comments; feel free to push back.

// set the catalog name for legacy compatibility
// TODO: use catalog and schema fields in hiveserver2 connection instad of DefaultNamespace so we don't need to cast
var defaultNamespace = ((DatabricksConnection)Connection).DefaultNamespace;
if (defaultNamespace != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move opening brace to new line

Basic (username and password) authentication is not supported at this time.

Optional default catalog and default schema can be set for the session with `adbc.connection.catalog` and `adbc.connection.db_schema` (catalog must be set if default schema is provided).
Optional default catalog and default schema can be set for the session with `adbc.connection.catalog` and `adbc.connection.db_schema`. The default catalog and schema will be used for subsequent metadata calls unless user specified different catalog/schema to use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider wrapping onto a second line

}
}

private async Task SetSchema(string schemaName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move opening brace to new line


private async Task SetSchema(string schemaName) {
using var statement = new DatabricksStatement(this);
statement.SqlQuery = $"USE {schemaName}";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the identifier here be escaped?

It's my vague understanding that the behavior of USE depends on whether or not there is a current catalog. Will there always be one when we get here or is my understanding wrong?

@toddmeng-db
Copy link
Contributor Author

toddmeng-db commented Jun 3, 2025

It's my vague understanding that the behavior of USE depends on whether or not there is a current catalog

Thats a good point - here's the docs. UC comes with a default catalog that does not need to be set explicitly, so the call should work. Pre-UC, USE is also still appropriate

For some workspaces that were enabled for Unity Catalog automatically, the workspace catalog was set as the default catalog. See Automatic enablement of Unity Catalog.
For all other workspaces, the hive_metastore catalog was set as the default catalog.

@toddmeng-db toddmeng-db force-pushed the toddmeng-db/namespace-fallback branch from db76c16 to 6d5ff2c Compare June 3, 2025 06:43
@toddmeng-db toddmeng-db force-pushed the toddmeng-db/namespace-fallback branch from 6d5ff2c to 83b9ba3 Compare June 3, 2025 06:49
@CurtHagenlocher CurtHagenlocher merged commit db253b1 into apache:main Jun 3, 2025
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants