Skip to content

Conversation

@toddmeng-db
Copy link
Contributor

@toddmeng-db toddmeng-db commented Jun 5, 2025

Changes

This PR makes changes to be consistent with odbc behavior

ODBC behavior:

  • Does not attach default schema (from OpenSessionResp) in subsequent queries. This PR changes to be like ODBC here, just to not break anything.
  • If catalog == "SPARK", do not set catalog in initial namespace; let server return default catalog

ODBC does the following when EnableMultipleCatalogsSupport=0

  • Do not set catalog in the initial namespace.
  • Do not use the OpenSessionResp catalog for subsequent queries (including metadata queries)

ODBC Driver Behavior Testing

Connection String Catalog/Schema EnableMultipleCatalogs=False EnableMultipleCatalogs=True
Only default catalog = x OpenSession(null, “default”)
GetColumns(null, null, null, null)
OpenSession(x, “default”)
GetColumns(x, null, null, null)
Only default schema = x OpenSession(null, x)
GetColumns(null, null, null, null)
OpenSession(null, x)
GetColumns(hive_metastore, null, null, null)
Both catalog + schema x, y OpenSession(null, y)
GetColumns(null, null, null, null)
OpenSession(x, y)
GetColumns(x, null, null, null)
Only default Catalog “SPARK”, schema = null OpenSessionReq(null, “default”)
GetColumns(null, null, null, null)
OpenSessionReq(null, “default”)
GetColumns(hive_metastore, null, null, null)
Both with default catalog “SPARK”, schema = x OpenSessionReq: null Catalog, x schema
GetColumns(null, null, null, null)
OpenSessionReq(null, x)
GetColumns(hive_metastore, null, null, null)

Testing

Adds an extensive grid test to validate different edge cases during creation

TODO

  • Make initial_namespace.schema "default" as default behavior
  • CatalogName + Foreign CatlogName
  • Add testing for old dbr (dbr 7.3, 9.1)

@toddmeng-db toddmeng-db changed the title Fix multiple catalogs with default database feat(csharp/src/Drivers/Databricks): Fix multiple catalogs with default database Jun 5, 2025
@toddmeng-db toddmeng-db changed the title feat(csharp/src/Drivers/Databricks): Fix multiple catalogs with default database feat(csharp/src/Drivers/Databricks): Multiple catalogs with default database Jun 5, 2025
@toddmeng-db toddmeng-db force-pushed the todd-meng_db/multiple-catalogs-default-database branch from 50ae107 to 7c511cd Compare June 5, 2025 00:17
@toddmeng-db toddmeng-db marked this pull request as ready for review June 5, 2025 00:35
@github-actions github-actions bot added this to the ADBC Libraries 19 milestone Jun 5, 2025
@toddmeng-db toddmeng-db marked this pull request as draft June 5, 2025 00:37
@toddmeng-db toddmeng-db force-pushed the todd-meng_db/multiple-catalogs-default-database branch 2 times, most recently from 9b86054 to d28726c Compare June 5, 2025 06:22
Properties.TryGetValue(AdbcOptions.Connection.CurrentDbSchema, out defaultSchema);

// SPARK is a special catalog name - rely on server to return a default catalog if there is one
if (defaultCatalog.Equals("SPARK", StringComparison.OrdinalIgnoreCase))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reuse this:

private void HandleSparkCatalog()

@toddmeng-db toddmeng-db force-pushed the todd-meng_db/multiple-catalogs-default-database branch 3 times, most recently from 300f0d0 to c415169 Compare June 5, 2025 23:35
{
SchemaName = defaultNamespace.SchemaName;
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ODBC does not use schema for subsequent queries, removing it here to be consistent

Properties.TryGetValue(AdbcOptions.Connection.CurrentDbSchema, out defaultSchema);

// SPARK is a special catalog name - rely on server to return a default catalog if there is one
defaultCatalog = HandleSparkCatalog(defaultCatalog);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: How would some use pass in SPARK as initial catalog? They are passing SPARK in PowerBI UI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this question-and-answer. Someone can pass SPARK as the initial catalog but then it gets ignored? And this is the correct behavior?

Copy link
Contributor Author

@toddmeng-db toddmeng-db Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maintains backward compatibility with older workspaces, where the Hive metastore was accessed via the spark catalog name.
In newer DBR versions with Unity Catalog, the default catalog is typically hive_metastore.
Passing null here allows the runtime to fall back to the workspace-defined default catalog for the session.

Let me add a comment to be clearer. Does the above make sense?

if (CatalogName == null)
if (CatalogName == null && connection.EnableMultipleCatalogSupport)
{
CatalogName = defaultNamespace.CatalogName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I realize that for GetCrossReference, it has a CatalogName and ForeignCatalog name, in our case where we will only assign foreign catalog/schema/table to the call, and in those case the catalogName is null and we should not override.
I've tested it, it seems working fine even we assign the catalogName where the actual parentTable is in another catalog. Let's add a TODO here just noting that.

@toddmeng-db toddmeng-db marked this pull request as ready for review June 6, 2025 00:27
Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have one question and one nitpick.

}
}

internal static String? HandleSparkCatalog(String? CatalogName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
internal static String? HandleSparkCatalog(String? CatalogName)
internal static string? HandleSparkCatalog(string? catalogName)

That is, use the type alias string instead of the type name String and use a lower-case argument name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, good catch thanks!

Properties.TryGetValue(AdbcOptions.Connection.CurrentDbSchema, out defaultSchema);

// SPARK is a special catalog name - rely on server to return a default catalog if there is one
defaultCatalog = HandleSparkCatalog(defaultCatalog);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this question-and-answer. Someone can pass SPARK as the initial catalog but then it gets ignored? And this is the correct behavior?

@toddmeng-db toddmeng-db force-pushed the todd-meng_db/multiple-catalogs-default-database branch from c8ea6e6 to 3bf027d Compare June 6, 2025 23:13
@CurtHagenlocher CurtHagenlocher merged commit 95391d0 into apache:main Jun 7, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants