refactor(sql): deprecate and remove hierarchical usage of schema#8655
refactor(sql): deprecate and remove hierarchical usage of schema#8655gforsyth merged 10 commits intoibis-project:mainfrom
Conversation
|
ACTION NEEDED Ibis follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
ibis/util.py
Outdated
| def _warn_schema(): | ||
| warn_deprecated( | ||
| name="schema", | ||
| as_of="9.0", | ||
| removed_in="10.0", | ||
| instead="Use the `database` kwarg with one of the following patterns:" | ||
| '\ndatabase="database"' | ||
| '\ndatabase=("catalog", "database")' | ||
| '\ndatabase="catalog.database"', | ||
| # TODO: add option for namespace object | ||
| ) |
There was a problem hiding this comment.
I'm going to move this into the deprecation class helper mixin, but after I squash down everything else because that rebase might be a bit gnarly.
| dbs = "SHOW SCHEMAS" | ||
|
|
||
| with self._safe_raw_sql(dbs) as cur: | ||
| databases = list(map(itemgetter(0), cur)) | ||
|
|
||
| return self._filter_with_like(databases, like) |
There was a problem hiding this comment.
Risingwave doesn't support information_schema.schemata but this works.
| # Include temporary tables only if no database has been explicitly specified | ||
| # to avoid temp tables showing up in all calls to `list_tables` | ||
| if db == "public": | ||
| out += self._fetch_temp_tables() | ||
|
|
||
| return self._filter_with_like(map(itemgetter(0), out), like) | ||
|
|
||
| def list_databases(self, like=None) -> list[str]: | ||
| def _fetch_temp_tables(self): | ||
| # postgres temporary tables are stored in a separate schema | ||
| # so we need to independently grab them and return them along with | ||
| # the existing results | ||
|
|
||
| sql = ( | ||
| sg.select("table_name") | ||
| .from_(sg.table("tables", db="information_schema")) | ||
| .distinct() | ||
| .where(C.table_type.eq("LOCAL TEMPORARY")) | ||
| .sql(self.dialect) | ||
| ) | ||
|
|
||
| with self._safe_raw_sql(sql) as cur: | ||
| out = cur.fetchall() | ||
|
|
||
| return out |
There was a problem hiding this comment.
both risingwave and postgres were returning all tables in all databases (formerly schema) when any list_tables call was made.
I fixed that to only return tables in self.current_database but that leaves out temp tables, so I added this helper to also return temp tables (which live in their own set of postgres-schema that have unpredictable names).
| raise ValueError( | ||
| "Using both the `schema` and `database` kwargs is not supported. " | ||
| "`schema` is deprecated and will be removed in Ibis 10.0" | ||
| "\nUse the `database` kwarg with one of the following patterns:" | ||
| '\ndatabase="database"' | ||
| '\ndatabase=("catalog", "database")' | ||
| '\ndatabase="catalog.database"', | ||
| ) |
There was a problem hiding this comment.
This is the odd duck out, as we had a schema kwarg in list_tables but no database kwarg, so this isn't a breaking change.
b55b64f to
ab2d621
Compare
ab2d621 to
596e9fc
Compare
9d86b41 to
80e225f
Compare
|
This is ready for a look (I think). I've left the various backend-specific commits unsquashed for the moment as I think that might make it easier to review, but if folks would rather all the groups of changes were visible together I can squash it down. Upon writing that, I think I'm going to squash it, should make reviewing the blocks of changes a bit easier. |
This is part of a larger refactor where we are removing all usage of the word `schema` in its hierarchical sense. Ibis will adhere to the following convention moving forward: * `schema`: a mapping of column names to datatypes * `database`: a collection of tables * `catalog`: a collection of databases These terms are mapped accordingly onto the specific backend terminology. This change to `list_tables` is not a breaking change. `schema` is deprecated and will warn on usage, but all existing code should remain functional. chore(schema): add mixin to help with schema deprecation warnings refactor(trino): deprecate schema in list_tables refactor(postgres): deprecate schema in list_tables refactor(oracle): deprecate schema in list_tables refactor(bigquery): deprecate schema in list_tables refactor(mysql): deprecate schema in list_tables refactor(duckdb): deprecate schema in list_tables refactor(mssql): deprecate schema in list_tables refactor(snowflake): deprecate schema in list_tables docs(list_tables): add backend-specific docstrings for list_tables fix(postgres): handle temp tables in list_tables
80e225f to
89e7d7c
Compare
BREAKING CHANGE: We are removing the word `schema` in its hierarchical sense. We use `database` to mean a collection of tables. The behavior of all `*_database` methods now applies only to collections of tables and never to collections of `database` (formerly `schema`) * `CanListDatabases` abstract methods now all refer to collections of tables. * `CanCreateDatabases` abstract methods now all refer to collections of tables. * `list_databases` now takes a kwarg `catalog`. * `create_database` now takes a kwarg `catalog`. * `drop_database` now takes a kwarg `catalog`. * `current_database` now refers to the current collection of tables. * `CanCreateSchema` is deprecated and `create_schema`, `drop_schema`, `list_schemas`, and `current_schema` are deprecated and redirect to the corresponding method/property ending in `database`. * We add a `CanListCatalog` and `CanCreateCatalog` that can list and create collections of `database`, respectively. The new methods are `list_catalogs`, `create_catalog`, `drop_catalog`, * There is a new `current_catalog` property. refactor(duckdb): catalog database switchover refactor(snowflake): catalog database switchover refactor(trino): catalog database switchover refactor(postgres): catalog database switchover refactor(oracle): catalog database switchover refactor(mssql): catalog database switchover refactor(bigquery): catalog database switchover refactor(clickhouse): catalog database switchover refactor(datafusion): catalog database switchover refactor(mysql): catalog database switchover refactor(exasol): catalog database switchover refactor(risingwave): catalog database switchover test(list_databases): add test for list_database contents
This is not a breaking change as `get_schema` was introduced during the-epic-split refactor and has not yet been released.
`schema` kwarg is deprecated in all `Backend.table` usage. Wherever it was present, we now have handling to warn if `schema` is passed, and to encourage users to switch usage to the tuple of `(catalog, database)`.
This is an internals only change, but changes `database` to `catalog` and changes `schema` to `database` for our internal `Namespace` op.
The `schema` kwarg here was introduced during the-epic-split, so not deprecating.
Deprecate the use of `schema` as a hierarchical term in all `*_view` DDL methods. Added a helper mixin to try to avoid continued repetition of the warning code and the repeated parsing of the `database` + `schema` combinations. refactor(pyspark): remove schema kwarg from *_view This would be a breaking change but currently this functionality is broken anyway. In any case, PySpark only has support for a single level of table hierarchy, so it should just be `database` as an option here. refactor(sqlite): deprecate schema in `*_view` ddl refactor(bigquery): deprecate schema in `*_view` ddl methods
`schema` as a hierarchical term is deprecated in `insert`. It will be removed Ibis 10.0. refactor(bigquery): deprecate `schema` kwarg in insert refactor(snowflake): deprecate `schema` kwarg in insert
89e7d7c to
d605837
Compare
|
|
||
| # Include temporary tables only if no database has been explicitly specified | ||
| # to avoid temp tables showing up in all calls to `list_tables` | ||
| if db == "public": |
There was a problem hiding this comment.
Interesting, this is a bit of an edge case.
On the one hand, temporary tables are always visible to the user who created them. On the other, the intent of asking for tables in a specific schema would seem to preclude showing them.
What you've done here seems fine to me.
| def _to_catalog_db_tuple(self, table_loc: sge.Table): | ||
| if table_loc is None or table_loc == (None, None): | ||
| return None, None | ||
|
|
||
| if (sg_cat := table_loc.args["catalog"]) is not None: | ||
| sg_cat.args["quoted"] = False | ||
| sg_cat = sg_cat.sql(self.name) | ||
| if (sg_db := table_loc.args["db"]) is not None: | ||
| sg_db.args["quoted"] = False | ||
| sg_db = sg_db.sql(self.name) | ||
|
|
||
| return sg_cat, sg_db | ||
|
|
||
| def _to_sqlglot_table(self, database): | ||
| if database is None: | ||
| return None | ||
| elif isinstance(database, tuple): | ||
| if len(database) > 2: | ||
| raise ValueError( | ||
| "Only database hierarchies of two or fewer levels are supported." | ||
| "\nYou can specify ('catalog', 'database')." | ||
| ) | ||
| elif len(database) == 2: | ||
| catalog, database = database | ||
| elif len(database) == 1: | ||
| database = database[0] | ||
| catalog = None | ||
| else: | ||
| raise ValueError( | ||
| f"Malformed database tuple {database} provided" | ||
| "\nPlease specify one of:" | ||
| '\n("catalog", "database")' | ||
| '\n("database",)' | ||
| ) | ||
| database = sg.exp.Table( | ||
| catalog=sg.to_identifier(catalog, quoted=self.compiler.quoted), | ||
| db=sg.to_identifier(database, quoted=self.compiler.quoted), | ||
| ) | ||
| elif isinstance(database, str): | ||
| # There is no definition of a sqlglot catalog.database hierarchy outside | ||
| # of the standard table expression. | ||
| # sqlglot parsing of the string will assume that it's a Table | ||
| # so we unpack the arguments into a new sqlglot object, switching | ||
| # table (this) -> database (db) and database (db) -> catalog | ||
| table = sg.parse_one(database, into=sg.exp.Table, dialect=self.dialect) | ||
| if table.args["catalog"] is not None: | ||
| raise exc.IbisInputError( | ||
| f"Overspecified table hierarchy provided: `{table.sql(self.dialect)}`" | ||
| ) | ||
| catalog = table.args["db"] | ||
| db = table.args["this"] | ||
| database = sg.exp.Table(catalog=catalog, db=db) | ||
| else: | ||
| raise ValueError("oops") | ||
|
|
||
| return database |
There was a problem hiding this comment.
yeah, me too, but I think we'll get some mileage out of it. This can also help with future usage patterns like con.table("catalog.database.table")
cpcloud
left a comment
There was a problem hiding this comment.
LGTM.
Epic work, thanks for tackling this.
|
xref #6331 |
No more
schemaThis is part of a larger refactor where we are removing all usage of the
word
schemain its hierarchical sense.Ibis will adhere to the following convention moving forward:
schema: a mapping of column names to datatypesdatabase: a collection of tablescatalog: a collection of databasesThere are a bunch of commits here that I'm currently keeping separate so I can add fixup changes to the appropriate backend as needed.
I plan to squash all of these into their corresponding top-level commits before we rebase.
I've listed out the commit messages below for those top-level commits (each one corresponding to the changes to, e..g
list_tablesorcreate_schema, etc.).refactor(list_tables): deprecate schema keyword
This change to
list_tablesis not a breaking change.schemaisdeprecated and will warn on usage, but all existing code should remain
functional.
refactor(ddl): deprecate all *_schema methods
BREAKING CHANGE: We are removing the word
schemain its hierarchicalsense. We use
databaseto mean a collection of tables. The behavior ofall
*_databasemethods now applies only to collections of tables andnever to collections of
database(formerlyschema)CanListDatabasesabstract methods now all refer tocollections of tables.
CanCreateDatabasesabstract methods now all refer tocollections of tables.
list_databasesnow takes a kwargcatalog.create_databasenow takes a kwargcatalog.drop_databasenow takes a kwargcatalog.current_databasenow refers to the current collection of tables.CanCreateSchemais deprecated andcreate_schema,drop_schema,list_schemas, andcurrent_schemaare deprecated and redirect to thecorresponding method/property ending in
database.CanListCatalogandCanCreateCatalogthat can list andcreate collections of
database, respectively.The new methods are
list_catalogs,create_catalog,drop_catalog,current_catalogproperty.refactor(get_schema): remove schema kwarg, add catalog, kw-only
This is not a breaking change as
get_schemawas introduced duringthe-epic-split refactor and has not yet been released.
refactor(table): deprecate schema
schemakwarg is deprecated in allBackend.tableusage. Wherever itwas present, we now have handling to warn if
schemais passed, and toencourage users to switch usage to the tuple of
(catalog, database).refactor(ops): use catalog and database kwargs for Namespace op
This is an internals only change, but changes
databasetocatalogand changes
schematodatabasefor our internalNamespaceop.refactor(table ddl): remove hierarchical schema from *_table methods
The
schemakwarg here was introduced during the-epic-split, so not deprecating.refactor(sql): deprecate schema in _view ddl methods
Deprecate the use of
schemaas a hierarchical term in all*_viewDDLmethods.
Added a helper mixin to try to avoid continued repetition of the warning
code and the repeated parsing of the
database+schemacombinations.refactor(sql): deprecate schema kwarg in insert
schemaas a hierarchical term is deprecated ininsert.It will be removed Ibis 10.0.
refactor(ddl): deprecate schema keyword in truncate_table
A few followups
Not for this PR, which is already too big
I misseddoneinsert, need to do deprecateschemathere, too.schemakwarg fromconnectanddo_connect(although I'm not sure this is necessary)schema?Fixes #8477
Fixes #8380
Fixes #8491
xref: #8253
xref: #6821
xref: #6489