-
Notifications
You must be signed in to change notification settings - Fork 173
feat(csharp/src/Drivers/Databricks): Default catalogs edge cases #2896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(csharp/src/Drivers/Databricks): Default catalogs edge cases #2896
Conversation
|
|
||
| public DatabricksStatement(DatabricksConnection connection) | ||
| : base(connection) | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eric-wang-1990 I think it makes more sense to do the schema filtering here, instead of in powerbi*. Do you think this makes sense?
edit: nevermind, there is no default schema provided in OpenSession response namespace if not specified by client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I think this PR prevents us from making blanket metadata queries with no catalog and schema (like GetTables). Is that acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this PR prevent GetTables metadata query?
This would just try to replace null with initial namespace value but should not affect any existing metadata queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant, is that with this PR, it is impossible to make GetTables call with no catalog attached to the request, since OpenSessionResp seems to typically have a default catalog in the response (even if not user provided). Seems like this behavior happens in odbc too.
So if I wanted to fetch all tables in the metastore, I couldn't use GetTables after this PR. And the way this PR is written, same for schemas, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline: user can use % in metadata call to override and list all
269da71 to
c19c65e
Compare
|
|
||
| private async Task SetSchema(string schemaName) { | ||
| using var statement = new DatabricksStatement(this); | ||
| statement.SqlQuery = $"USE {schemaName}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be use database xxx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
odbc does use <xxx>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the identifier here be escaped?
It's my vague understanding that the behavior of USE depends on whether or not there is a current catalog. Will there always be one when we get here or is my understanding wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toddmeng-db Let's make another ticket to handle the escape, all the metadata call does not escape now but should be doing that.
@CurtHagenlocher Regarding whether there is a current catalog, this will only be called on legacy DBR where there is no catalog concept, in such scenarios we should be able to just call use schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this is now being quoted. Is it possible to use a backtick in the identifier and if so does it need to be doubled when quoted? e.g. should a literal name of ab`cd be represented as `ab``cd` or is such a name not possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do what @eric-wang-1990 mentioned and cut a ticket to handle escape
| } | ||
| if (SchemaName == null) | ||
| { | ||
| SchemaName = defaultNamespace.SchemaName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this expected? Does simba fill schema with default schema as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to check. The server will not return a default schema by default, though. So this only affects queries were the user has set the schema. Which seems reasonable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tested - odbc does not do this. Should I remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you confirm this? In the metadata call the schema name can be either null or %, if it's % then the driver will not do anything, it will only replace if it's null. What metadata query did you see for simba?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the call directly to odbc (not via powerbi). If both catalog and schema are provided, schema is still % by default
Listing tables 'catalog : peco, schemaPattern : %, tableTypes : null, tableName : %'
| { | ||
| await base.HandleOpenSessionResponse(session); | ||
| if (session != null) | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are already here, can you also override _enableMultipleCatalogSupport with correct value from the response?
0548dfd to
652e82b
Compare
| { | ||
| _defaultNamespace = new TNamespace | ||
| { | ||
| CatalogName = defaultCatalog, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we only set ifcatalog/schema is not null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that odbc allows for setting catalog to whitespace; but I also think this will make the query broken, right? since there are no whitespace catalogs/schemas?:
Listing tables 'catalog : , schemaPattern : %, tableTypes : null, tableName : %'
| { | ||
| _defaultNamespace = session.InitialNamespace; | ||
| } | ||
| else if (_defaultNamespace != null && !string.IsNullOrEmpty(_defaultNamespace.SchemaName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add some more comments around this else clause, basically this only happens for when there is no initialNamespace response, which means it's an old DBR + Also user pass in a default schema name.
We are not doing it for catalog since multipleCatalog is needed for specifying catalogs, and that is the same protocol version with which we support for initialNamespace, which means if the response does not contain initialNamespace means we do not support specifying catalog at all.
652e82b to
8a05e12
Compare
CurtHagenlocher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! A few minor questions/comments; feel free to push back.
| // set the catalog name for legacy compatibility | ||
| // TODO: use catalog and schema fields in hiveserver2 connection instad of DefaultNamespace so we don't need to cast | ||
| var defaultNamespace = ((DatabricksConnection)Connection).DefaultNamespace; | ||
| if (defaultNamespace != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move opening brace to new line
| Basic (username and password) authentication is not supported at this time. | ||
|
|
||
| Optional default catalog and default schema can be set for the session with `adbc.connection.catalog` and `adbc.connection.db_schema` (catalog must be set if default schema is provided). | ||
| Optional default catalog and default schema can be set for the session with `adbc.connection.catalog` and `adbc.connection.db_schema`. The default catalog and schema will be used for subsequent metadata calls unless user specified different catalog/schema to use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider wrapping onto a second line
| } | ||
| } | ||
|
|
||
| private async Task SetSchema(string schemaName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move opening brace to new line
|
|
||
| private async Task SetSchema(string schemaName) { | ||
| using var statement = new DatabricksStatement(this); | ||
| statement.SqlQuery = $"USE {schemaName}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the identifier here be escaped?
It's my vague understanding that the behavior of USE depends on whether or not there is a current catalog. Will there always be one when we get here or is my understanding wrong?
Thats a good point - here's the docs. UC comes with a default catalog that does not need to be set explicitly, so the call should work. Pre-UC, USE is also still appropriate
|
db76c16 to
6d5ff2c
Compare
6d5ff2c to
83b9ba3
Compare
This PR adds support for edge cases that the ODBC driver supported. It also allows users to specify a default schema without a default catalog (relying on the default catalog set by backend server)
Default schema fallback
If spark protocol version is too low, initial namespace will not be respected. This PR sets
USE <schema>as a backup immediately afterTOpenSessionResp.SET CATALOG <catalog>is not implemented since it is introduced at the same time as InitialNamespace.Additionally, schema can now be provided in the OpenSessionReq without Catalog.
This is relevant for setting default schemas for dbr < 10.4. It means that we can now provide a default schema for pre-unity-catalog
Testing
To test this add to DATABRICKS_TEST_CONFIG_FILE:
And use a cluster running dbr < 10.4, run
OlderDBRVersion_ShouldSetSchemaViaUseStatementDefault catalog legacy compatibility
If default catalog or default schema is provided in the
TOpenSessionResp, we want subsequent getTables calls to use the default namespace. This enables powerbi to act as if it is pre-UC, since it will automatically make all getTables requests with default catalog, which is oftenhive_metastore.If no schema is provided in OpenSessionReq, only default catalog will be returned. This means whether or not metadata queries have a default schema provided is dependent on if "db_schema" is provided
So, in powerbi, when only a default schema is provided but not a default catalog, it will automatically restrict itself to the default catalog.
Testing
Added
MetadataQuery_ShouldUseResponseNamespace