Introduce new schema introspection API#7153
Conversation
a12e62f to
d18986a
Compare
# Conflicts: # UPGRADE.md
d18986a to
75be0ff
Compare
75be0ff to
68038b8
Compare
greg0ire
left a comment
There was a problem hiding this comment.
I don't know what's most impressive between the patch or its description… great job!
| * Unlike other metadata row types, this one contains a full Column object. The reason is that most of the properties | ||
| * the column carries belong to its data type, not the column itself. Therefore, it is more practical to interpret | ||
| * type-specific properties during introspection within a metadata provider and instantiate a complete Column object | ||
| * right away instead defining an abstraction to represent them. |
There was a problem hiding this comment.
| * right away instead defining an abstraction to represent them. | |
| * right away instead of defining an abstraction to represent them. |
| ## Deprecated `AbstractSchemaManager` methods | ||
|
|
||
| The following `AbstractSchemaManager` methods have been deprecated: | ||
| 1. `listDatabases()` - use `introspectDatabaseNames()` instead. |
There was a problem hiding this comment.
For methods that appear in the docs, like in this example:
dbal/docs/en/reference/schema-manager.rst
Lines 28 to 36 in 2ce1218
should the corresponding docs be removed, or should a note be added about them being deprecated in favor of another?
There was a problem hiding this comment.
Yeah, we can replace them with the recommended ones. I recalled that I need to update the docs like 5 times but still forgot 😓
There was a problem hiding this comment.
Speaking of documentation, I learned from a person who professionally writes documentation that wrapping lines in RST/AsciiDoc representing technical documentation is an anti-pattern.
The argument is that, let's say the max line width is 120 characters and you have a paragraph consisting of 1000 characters, so it spans 9 lines. If you add a couple of words in the beginning of the paragraph, it will shift all following words and will require reformatting the entire paragraph and all 9 lines making the diff unreadable.
@greg0ire, @SenseException what do you think about not wrapping lines? On the one hand, it will add quite some horizontal scrolling, but on the other hand, the markdown will not require manual formatting during editing.
There was a problem hiding this comment.
I'd say the diff only becomes unreadable if it's a line diff, but at least with GitHub, this will result in a big diff with a smaller diff based on word diff inside. Also, if I open the source for technical docs on a wide display, I don't want to have to read a 1000 character long line. If we are that concerned about reformatting, we should IMO consider https://sembr.org/ instead. But I don't think it is that much of a concern.
There was a problem hiding this comment.
Yeah, not a concern. I'm fine with what we have right now.
d6dc2b0 to
11873bf
Compare
| Parameters containing identifiers passed to the SchemaManager | ||
| methods are *NOT* quoted automatically! Identifier quoting is | ||
| really difficult to do manually in a consistent way across | ||
| different databases. You have to manually quote the identifiers | ||
| when you accept data from user or other sources not under your | ||
| control. |
There was a problem hiding this comment.
This is no longer the case 😎
There was a problem hiding this comment.
Now I realize we need a separate article (and maybe an additional blog post) that explains the meaning of quoted/unquoted identifiers (i.e. that both are safe to use), the API design around them, etc. But I'll postpone this until the hackathon.
docs/en/reference/schema-manager.rst
Outdated
| --------------- | ||
|
|
||
| Retrieve an array of databases on the configured connection: | ||
| Retrieve a list of the names of available database: |
There was a problem hiding this comment.
| Retrieve a list of the names of available database: | |
| Retrieve a list of the names of available databases: |
11873bf to
75f2cb1
Compare
This is a logical continuation of the effort started in #5268 – it moved some of the introspection SQL statements from platform classes to schema managers. This PR consolidates the logic of building the queries for schema introspection and processing their results, thus providing a cleaner and more robust API.
Problems with the current API
Wrong abstractions
The current API is better than before #5268, but it's still designed around wrong abstractions and thus declares very non-obvious contracts. It makes implementations brittle and hard to statically analyze:
AbstractSchemaManager::select*()methods like the following:dbal/src/Schema/AbstractSchemaManager.php
Lines 344 to 350 in aa6b906
AbstractSchemaManager::_getPortable*Definition()which accepts an associative array representing a row of the result produced by the first method and is supposed to process it and return a schema object. For example:dbal/src/Schema/AbstractSchemaManager.php
Lines 826 to 833 in aa6b906
It would be okay-ish, if the schema manager implementation for each database platform implemented its "select" and "get portable definition" methods, which would assume a certain shape of row per platform. But in reality, there are shared implementations of the "get portable definition" method. For example:
dbal/src/Schema/AbstractSchemaManager.php
Lines 1361 to 1368 in 903d441
This way, the shared implementation enforces the shape of the result on all implementations. As a result, implementations have to implement some "translation" of their "native" shape of the result into the shape expected by the shared implementation. For example:
dbal/src/Schema/SQLServerSchemaManager.php
Lines 370 to 381 in aa6b906
We are lucky to have built a great integration test suite, but the code itself is very hard to reason about.
array_change_key_case($row, CASE_LOWER)all over the placeThe reason for that is that the connection used by
AbstractSchemaManagermay use aPortability\Middleware. In this case, there is no guarantee that the array keys of fetched rows will be in the same case as in the query (even if the corresponding column names are quoted), so these calls toarray_change_key_case()compensate for that.These calls aren't used consistently across the codebase (e.g. in
MySQLSchemaManager). As a result, there may be bugs reproducible in some portability modes.Proposed design
classDiagram class AbstractSchemaManager AbstractSchemaManager : +introspectTables() interface SchemaProvider SchemaProvider : +getAllTables() SchemaProvider <-- AbstractSchemaManager SchemaProvider <|-- IntrospectingSchemaProvider class IntrospectingSchemaProvider IntrospectingSchemaProvider : +getAllTables() IntrospectingSchemaProvider : -getColumnsForAllTables() IntrospectingSchemaProvider : -getIndexesForAllTables() interface MetadataProvider MetadataProvider: +getTableColumnsForAllTables() MetadataProvider: +getIndexColumnsForAllTables() MetadataProvider <-- IntrospectingSchemaProvider class MySQLMetadataProvider MetadataProvider <|-- MySQLMetadataProvider class PostgreSQLMetadataProvider MetadataProvider <|-- PostgreSQLMetadataProvider class IndexColumnMetadataProcessor IndexColumnMetadataProcessor <-- IntrospectingSchemaProvider class ForeignKeyConstraintColumnMetadataProcessor ForeignKeyConstraintColumnMetadataProcessor <-- IntrospectingSchemaProviderThe
AbstractSchemaManagerclassThe schema manager remains the user-facing API for schema introspection but no longer contains any implementation. In addition to that, its only responsibility is to translate table names from the user-input format (e.g.
OptionallyQualifiedNameor a unquoted/quoted pair of schema/table name) into the actual schema/table name for the underlying schema provider.This transformation includes:
users→public.userson Postgres).Rejected ideas
Originally I thought that users could interact with the schema provider directly. Similar to
Connection::createSchemaManager()they could useConnection::createSchemaProvider(). This would provide a nice CQRS-style separation of APIs. However, there is theAbstractSchemaManager::migrateSchema()which internally both introspects the schema and applies the changes, so schema introspection cannot be completely moved out ofAbstractSchemaManager.We could sacrifice
AbstractSchemaManager::migrateSchema()(which is just a convenience methods) and separate the APIs cleanly, but I don't know what other side effects it would have.The
SchemaProviderinterfaceThe schema provider provides access to the elements of the underlying database schema in the form of schema objects (e.g.
Column) or their collections. With this approach, the way in which a given implementation obtains the information about the underlying schema become an implementation detail.The
IntrospectingSchemaProviderclassThis is the only out-of-the box
SchemaProviderimplementation. It is responsible for providing the schema by introspecting the underlying database. Internally, it interacts with aMetadataProvider. It requests metadata from the metadata provider and builds schema objects from them.Additionally, it's responsible for unifying the shape of the schema introspection results between the platforms that do and do not support schemas. If the underlying platform supports schemas, and the schema name of an object matches the current schema, the schema name will be omitted to match the behavior of the platforms that do not support schemas. For example, on PostgresSQL, a table
public.userswill be introspected asusers.The
MetadataProviderinterfaceMetadataProvideris a lower-level interface compared toSchemaProvider. It provides access to the metadata objects, which are the analogs of thearray<string, mixed>arrays used by current schema introspection API. Unlike them, they are well-defined and strictly typed.Each
AbstractPlatformimplementation is now responsible for instantiating a metadata provider:dbal/src/Platforms/AbstractPlatform.php
Lines 2451 to 2454 in 903d441
With the new API, most of the time, the
SELECTstatement and the usage of the selected columns in the PHP code are located in the same method. For example:dbal/src/Platforms/MySQL/MySQLMetadataProvider.php
Lines 84 to 90 in d95f0ad
This allows fetching the result using
iterateNumeric()or eveniterateColumn()and eliminates the need forarray_change_key_case().Access to the objects scoped to a table (e.g. table columns) is usually provided in the form of two methods:
dbal/src/Schema/Metadata/MetadataProvider.php
Line 62 in 7ea5a70
dbal/src/Schema/Metadata/MetadataProvider.php
Line 80 in 7ea5a70
This way, depending on whether the consumer needs to introspect the schema of a single table or of all tables in the database, the corresponding method is invoked. Under the hood, these methods usually share an implementation.
MetadataProvidermethods return collections in the form ofiterable<T>(usually, aGenerator<T>), notarray<T>. This is not because we want to optimize memory usage. This is because it's more convenient to iterate the query result usingforeachandyeildrows w/o buffering them in an array. The metadata elements aren't supposed to be accessed by index – only iterated and processed one by one.iterable<T>works just fine for that.Metadata processors
Metadata processors generalize the logic of converting one or many metadata rows into one or more corresponding schema object (e.g. the metadata about foreign key constraint columns into a list of foreign key constraints). This logic is agnostic of the database platform, so there is one implementation per object type (e.g.
ForeignKeyConstraintColumnMetadataProcessor).The exception is table columns. Unlike other metadata row types which contain scalar values,
TableColumnMetadataRowcontains a fullColumnobject. The reason is that most of the properties the column carries belong to its data type, not the column itself. Therefore, it is more practical to interpret type-specific properties during introspection within a metadata provider and instantiate a completeColumnobject right away instead defining an abstraction to represent them.What else has improved?
AbstractSchemaManager::SCHEMA_NAME_COLUMNandAbstractSchemaManager::TABLE_NAME_COLUMN) will be gone.list<T>, notarray<lower-case-name, T>. This will allow to eliminate the conflicts between objects with the names that differ only in case. This makes the API more consistent with Deprecate using keys of table-scoped object collections as names #7102 and other similar changes.Known limitations
Unlike the current API, the new one doesn't allow introspection of databases other than the current one. The reasons are:
$databaseNameparameter.Deprecations and upgrade path
list<Object>s()are being deprecated in favor of the correspondingintrospect<Object>s(). The methods that used to return a list of strings as names (e.g.listDatabases()) now return a list of quoted names.getTable<Something>()that accept the table name as a god-knows-how formatted string, are being deprecated in favor of a set ofintrospectTable<Something>(By(Unqualified|Qualified)Name)?(). This way, the user can be explicit about the semantics of the string or can pass the name as object.The exception is
introspectTable(string $name). There's no way to provide an upgrade path in the form ofintrospectTable(OptionallyQualified $name)because this name is already taken. I expect that most users will use theBy(Unqualified|Qualified)Nameversions of this method. The signature will be modified in 5.0.x.Modified tests
Most of the calls to the deprecated APIs in the tests have been replaced with the use of the recommended ones. There are a few exceptions where a given deprecated method is still used in some tests:
listTableColumns()is used by a test that covers the deprecatedjsonbcolumn option.listTableIndexes()is used by the tests that expect the PK constraint to be returned as well (also deprecated).introspectTable()is used by the tests that:OptionallyQualifiedNametable name. Using one of the recommended methods would require breaking it down into the schema name and unqualified name. This won't be a problem in 5.0.x.Index::getColumns(),ForeignKeyConstraint::getLocalColumns()and other similar methods. With the new API, the methods return names surrounded by quotes, which breaks the test. They all are deprecated.Remaining known issues
Unlike announced in #7030 (comment), the migration to the new API alone won't address the problem of introspecting object names containing special characters (e.g. dots). The reason is that, even though the names will be introspected as quoted,
AbstractAsset::_setName()will ignore that and parse the dot as a special character:dbal/src/Schema/AbstractAsset.php
Lines 128 to 129 in aa6b906
This issue should be already addressed in 5.0.x for most if not all the object types.
Future plans
Get rid of
@internalplatform-specific introspection-specific methods in platform classes:dbal/src/Platforms/AbstractMySQLPlatform.php
Line 898 in 1041f02
dbal/src/Platforms/PostgreSQLPlatform.php
Line 704 in cfa4a04
The query used to introspect the schema should be part of the corresponding metadata provider implementation. In general, supporting platform versions by extending classes is not the best approach (I can elaborate on that separately).
Introduce an interface for
getDoctrineTypeMapping(). This way, combined with the previous step, we can eliminate the dependency ofMetadataProviderimplementations onAbstractPlatform.Introduce a better schema filtering API. The current API is error-prone: it filters tables by a vaguely formatted string (
<schema-name>.<table-name>), so if the input contains a dot, it cannot tell whether the dot is a separator or part of the schema or table name. Additionally, filtering sequences probably never worked (the entireSequenceobject is passed to the filter instead of the name).