-
Notifications
You must be signed in to change notification settings - Fork 714
refactor(language): proposal to remove all usage of schema to refer to hierarchy #8477
Description
Schema as a word in databases is fraught with dual meanings and is
generally horrible. We do not use it consistently in Ibis, even
sometimes using different meanings for the same method but on different
backends.
I propose a complete elimination of the hierarchical "schema" that Postgres and others have saddled us with.
Glossary
For the purposes of this issue and also moving forward in Ibis:
database: a collection of tablescatalog: a collection of databasesschema: a mapping of column names to dtypes and NOTHING ELSE
The database kwarg in Ibis
Post refactor, database can always be one of:
- string of
"database" - dotted string of
"catalog.database"(although this might error if you
pass a"catalog.database"to a backend that only has one level of
hierarchy) - tuple of
("catalog", "database") - ibis namespace object
Places where we currently use hierarchical schema and removal proposal
Backend.list_tables
Current: list_tables that only takes schema as kwarg
- mysql
- postgres
- oracle
Future:
These 3 are relatively easy, we can deprecate schema and have it warn
and add database
Current: list_tables that takes both database and schema as kwargs
- trino
- bigquery
- duckdb
- snowflake
- mssql
-
Current behavior:
- user only passes
schema, we assume currentdatabase - user only passes
database, we assume currentschema(duckdb,
but this seems like a bug and no one would do this) - user only passes
database, we assumeinformation_schemaas
schema(mssql) - user only passes
database, we error (trino, bigquery, snowflake)
- user only passes
Future:
-
if user only passes deprecated
schema, warn, treat as newdatabase -
if user only passes
database, treat as newdatabaseThis is
technically a hard break that we can't warn users about (in code).
This would only impact (if anyone) users of themssqlbackend and maybe DuckDB (but I doubt it) -
if user passes both old
databaseand oldschema, warn, treat as
"catalog.database"
Backend.table
Current: Backend.table that takes schema and database
- Default SQL backend behavior
- bigquery currently errors if database only
Future:
Deprecate schema, warn if schema is passed, make kw-only
In ibis 8, we were creating an ops.Namespace, passing that to
_get_sqla_table and only using the schema attribute of
ops.Namespace so I don't think anyone was using only database in a
functional way even if it wasn't explicitly erroring there.
(sqlalchemy only accepts a schema kwarg when defining a
sqlalchemy table, this is one of the reasons for the current mess.)
While this is a breaking change, I don't believe it will break anyone's
code without warning.
Current: Backend.table that takes schema as mapping
- polars takes a
_schemakwarg that is unused - pandas
- dask
For dask and pandas you can use schema to override the mapping
schema used to create a table, sort of similar to using ibis.table?
Future:
Deprecate schema keyword, offer no replacement for this weird and
inconsistent functionality.
Possible: add the database kwarg for API consistency and no-op if it
gets used (or error?)
Backend.get_schema
This is new since TES and we can rename it to get_database (or
something else)
Backend.list_schemas
We deprecate this, point users at list_databases
Backend.list_databases
Current: was undefined or was returning nothing (because backend has no catalog support)
Future:
This will now return database And we add a catalog kwarg so you can
list databases in a given catalog
Current: returns catalog
Future:
This will just break. This is unfortunate, but I'm cautiously optimistic
that no one is using this programmatically.
Other changes and possibly additional steps in later versions
- Add
Backend.list_catalogswhich behaves like the old version of
list_databases
After we remove schema (next major version after deprecating),
we can consider adding an additional catalog kwarg to several of the
above methods. We will still continue to allow all the database
behaviors listed above and we add error handling for if someone
specifies catalog and also provides a dotted path as database (we
have this now for overspecifying database.schema).
Metadata
Metadata
Assignees
Labels
Type
Projects
Status