Skip to content

Fixed escaping SQL object names#836

Merged
nfx merged 4 commits intomainfrom
fix/issue_833_escaping
Jan 26, 2024
Merged

Fixed escaping SQL object names#836
nfx merged 4 commits intomainfrom
fix/issue_833_escaping

Conversation

@larsgeorge-db
Copy link
Copy Markdown
Contributor

Changes

Implements handling of special characters in SQL names. Optionally escapes the catalog, schema, or table names with backticks for SQL queries.

Linked issues

Resolves #833

Tests

  • added unit tests

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 24, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (ae54f76) 85.61% compared to head (f112427) 85.66%.
Report is 1 commits behind head on main.

Files Patch % Lines
...rc/databricks/labs/ucx/hive_metastore/locations.py 66.66% 0 Missing and 1 partial ⚠️
src/databricks/labs/ucx/hive_metastore/udfs.py 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #836      +/-   ##
==========================================
+ Coverage   85.61%   85.66%   +0.05%     
==========================================
  Files          41       42       +1     
  Lines        5212     5232      +20     
  Branches      950      953       +3     
==========================================
+ Hits         4462     4482      +20     
  Misses        536      536              
  Partials      214      214              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@dmoore247 dmoore247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work!
Please consider testing unicode, kanji (Multi-language support, and embedded space chars).

Comment thread tests/unit/framework/test_crawlers.py Outdated
assert "`a-b`.`c`.`d`" == cb.escape("a-b.c.d", False)
assert "`a`.`b-c`.`d`" == cb.escape("a.b-c.d", False)
assert "`a`.`b`.`c-d`" == cb.escape("a.b.c-d", False)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing!!!
Please consider testing unicode, kanji, embedded space characters.

This documentation isn't much help... https://docs.databricks.com/en/sql/language-manual/sql-ref-names.html#names

@nfx nfx changed the title Fixed escaping issues with object names. Fixed escaping SQL object names Jan 24, 2024
"""
return f"{self._catalog}.{self._schema}.{self._table}"

@staticmethod
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a function of it and call it escape_sql_identifier, it doesn't really belong to this class, as it turns out from number of CrawlerBase. prefixes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense, I was about to ask that. Where do we put functions? A new Python source in the root of the project? Or some utils subdir?

@daniel-martinez-db
Copy link
Copy Markdown

I have a pattern like this:
my_catalog.my-database.mytable_namev1.0 which should be escaped as
my_catalog.my-database.mytable_name_v1.0
Can that test be included please?

@larsgeorge-db
Copy link
Copy Markdown
Contributor Author

larsgeorge-db commented Jan 25, 2024

I have a pattern like this: my_catalog.my-database.mytable_namev1.0 which should be escaped as my_catalog.my-database.mytable_name_v1.0 Can that test be included please?

How would you handle this: cat.alog.schema.table? Is your naming assuming only the last part can have dots in it? Please clarify. Also, see the above linked Databricks Naming convention, it states dots are NOT allowed in names.

@daniel-martinez-db
Copy link
Copy Markdown

I have a pattern like this: my_catalog.my-database.mytable_namev1.0 which should be escaped as my_catalog.my-database.mytable_name_v1.0 Can that test be included please?

How would you handle this: cat.alog.schema.table? Is your naming assuming only the last part can have dots in it? Please clarify. Also, see the above linked Databricks Naming convention, it states dots are NOT allowed in names.

Thanks. The patterns I'm working with only has dots in the table name, although I think we can escape all of them (maybe not in the test, but in the actual code). It's true that DB does not allow for dots, but other catalogs do; I'm facing this issue as I'm crawling an external hive metastore which allows for dots. Thank you!

@larsgeorge-db larsgeorge-db self-assigned this Jan 25, 2024
@larsgeorge-db larsgeorge-db added this pull request to the merge queue Jan 26, 2024
@nfx nfx removed this pull request from the merge queue due to a manual request Jan 26, 2024
@nfx nfx merged commit bf75b50 into main Jan 26, 2024
@nfx nfx deleted the fix/issue_833_escaping branch January 26, 2024 16:07
nfx added a commit that referenced this pull request Jan 26, 2024
* Added `databricks labs ucx alias` command to create a view of tables from one schema/catalog in another schema/catalog ([#837](#837)).
* Added `databricks labs ucx save-aws-iam-profiles` command to scan instance profiles identify AWS S3 access and save a CSV with permissions ([#817](#817)).
* Added total view counts in the assessment dashboard ([#834](#834)).
* Cleaned up `assess_jobs` and `assess_clusters` tasks in the `assessment` workflow to improve testing and reduce redundancy.([#825](#825)).
* Added documentation for the assessment report ([#806](#806)).
* Fixed escaping for SQL object names ([#836](#836)).

Dependency updates:

 * Updated databricks-sdk requirement from ~=0.17.0 to ~=0.18.0 ([#832](#832)).
@nfx nfx mentioned this pull request Jan 26, 2024
nfx added a commit that referenced this pull request Jan 26, 2024
* Added `databricks labs ucx alias` command to create a view of tables
from one schema/catalog in another schema/catalog
([#837](#837)).
* Added `databricks labs ucx save-aws-iam-profiles` command to scan
instance profiles identify AWS S3 access and save a CSV with permissions
([#817](#817)).
* Added total view counts in the assessment dashboard
([#834](#834)).
* Cleaned up `assess_jobs` and `assess_clusters` tasks in the
`assessment` workflow to improve testing and reduce
redundancy.([#825](#825)).
* Added documentation for the assessment report
([#806](#806)).
* Fixed escaping for SQL object names
([#836](#836)).

Dependency updates:

* Updated databricks-sdk requirement from ~=0.17.0 to ~=0.18.0
([#832](#832)).
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
* Added `databricks labs ucx alias` command to create a view of tables
from one schema/catalog in another schema/catalog
([#837](#837)).
* Added `databricks labs ucx save-aws-iam-profiles` command to scan
instance profiles identify AWS S3 access and save a CSV with permissions
([#817](#817)).
* Added total view counts in the assessment dashboard
([#834](#834)).
* Cleaned up `assess_jobs` and `assess_clusters` tasks in the
`assessment` workflow to improve testing and reduce
redundancy.([#825](#825)).
* Added documentation for the assessment report
([#806](#806)).
* Fixed escaping for SQL object names
([#836](#836)).

Dependency updates:

* Updated databricks-sdk requirement from ~=0.17.0 to ~=0.18.0
([#832](#832)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Function crawl_grants blows up when databases have special characters

4 participants