Introduce new schema introspection API by morozov · Pull Request #7153 · doctrine/dbal

morozov · 2025-09-16T00:49:31Z

This is a logical continuation of the effort started in #5268 – it moved some of the introspection SQL statements from platform classes to schema managers. This PR consolidates the logic of building the queries for schema introspection and processing their results, thus providing a cleaner and more robust API.

Problems with the current API

Wrong abstractions

The current API is better than before #5268, but it's still designed around wrong abstractions and thus declares very non-obvious contracts. It makes implementations brittle and hard to statically analyze:

There are AbstractSchemaManager::select*() methods like the following:

dbal/src/Schema/AbstractSchemaManager.php

Lines 344 to 350 in aa6b906

    
               /** 
        
                * Selects definitions of table columns in the specified database. If the table name is specified, narrows down 
        
                * the selection to this table. 
        
                * 
        
                * @throws Exception 
        
                */ 
        
               abstract protected function selectTableColumns(string $databaseName, ?string $tableName = null): Result;

The contract of this interface implies that each implementation will produce an SQL result with a certain set of columns, but the exact shape of the result isn't described anywhere.

As a counterpart to most of such methods, there is an AbstractSchemaManager::_getPortable*Definition() which accepts an associative array representing a row of the result produced by the first method and is supposed to process it and return a schema object. For example:

dbal/src/Schema/AbstractSchemaManager.php

Lines 826 to 833 in aa6b906

    
               /** 
        
                * Gets Table Column Definition. 
        
                * 
        
                * @param array<string, mixed> $tableColumn 
        
                * 
        
                * @throws TypesException 
        
                */ 
        
               abstract protected function _getPortableTableColumnDefinition(array $tableColumn): Column;

It would be okay-ish, if the schema manager implementation for each database platform implemented its "select" and "get portable definition" methods, which would assume a certain shape of row per platform. But in reality, there are shared implementations of the "get portable definition" method. For example:

dbal/src/Schema/AbstractSchemaManager.php

Lines 1361 to 1368 in 903d441

    
           protected function _getPortableTableIndexesList(array $rows, string $tableName): array 
        
           { 
        
               $result = []; 
        
               foreach ($rows as $row) { 
        
                   $indexName = $keyName = $row['key_name']; 
        
                   if ($row['primary']) { 
        
                       $keyName = 'primary'; 
        
                   }

This way, the shared implementation enforces the shape of the result on all implementations. As a result, implementations have to implement some "translation" of their "native" shape of the result into the shape expected by the shared implementation. For example:

dbal/src/Schema/SQLServerSchemaManager.php

Lines 370 to 381 in aa6b906

    
                         SELECT 
        
                                  scm.name AS schema_name, 
        
                                  tbl.name AS table_name, 
        
                                  idx.name AS key_name, 
        
                                  col.name AS column_name, 
        
                                  ~idx.is_unique AS non_unique, 
        
                                  idx.is_primary_key AS [primary], 
        
                                  CASE idx.type 
        
                                      WHEN '1' THEN 'clustered' 
        
                                      WHEN '2' THEN 'nonclustered' 
        
                                      ELSE NULL 
        
                                  END AS flags

We are lucky to have built a great integration test suite, but the code itself is very hard to reason about.

`array_change_key_case($row, CASE_LOWER)` all over the place

The reason for that is that the connection used by AbstractSchemaManager may use a Portability\Middleware. In this case, there is no guarantee that the array keys of fetched rows will be in the same case as in the query (even if the corresponding column names are quoted), so these calls to array_change_key_case() compensate for that.

These calls aren't used consistently across the codebase (e.g. in MySQLSchemaManager). As a result, there may be bugs reproducible in some portability modes.

Proposed design

classDiagram
    class AbstractSchemaManager
    AbstractSchemaManager : +introspectTables()

    interface SchemaProvider
    SchemaProvider : +getAllTables()
    SchemaProvider <-- AbstractSchemaManager

    SchemaProvider <|-- IntrospectingSchemaProvider

    class IntrospectingSchemaProvider
    IntrospectingSchemaProvider : +getAllTables()
    IntrospectingSchemaProvider : -getColumnsForAllTables()
    IntrospectingSchemaProvider : -getIndexesForAllTables()

    interface MetadataProvider
    MetadataProvider: +getTableColumnsForAllTables()
    MetadataProvider: +getIndexColumnsForAllTables()
    MetadataProvider <-- IntrospectingSchemaProvider

    class MySQLMetadataProvider
    MetadataProvider <|-- MySQLMetadataProvider

    class PostgreSQLMetadataProvider
    MetadataProvider <|-- PostgreSQLMetadataProvider

    class IndexColumnMetadataProcessor
    IndexColumnMetadataProcessor <-- IntrospectingSchemaProvider

    class ForeignKeyConstraintColumnMetadataProcessor
    ForeignKeyConstraintColumnMetadataProcessor <-- IntrospectingSchemaProvider

The `AbstractSchemaManager` class

The schema manager remains the user-facing API for schema introspection but no longer contains any implementation. In addition to that, its only responsibility is to translate table names from the user-input format (e.g. OptionallyQualifiedName or a unquoted/quoted pair of schema/table name) into the actual schema/table name for the underlying schema provider.

This transformation includes:

Folding of the identifiers according to the semantics of the underlying platform (e.g. unquoted identifiers are upper-cased on Oracle and Db2).
Applying the default schema name (e.g. users → public.users on Postgres).

Rejected ideas

Originally I thought that users could interact with the schema provider directly. Similar to Connection::createSchemaManager() they could use Connection::createSchemaProvider(). This would provide a nice CQRS-style separation of APIs. However, there is the AbstractSchemaManager::migrateSchema() which internally both introspects the schema and applies the changes, so schema introspection cannot be completely moved out of AbstractSchemaManager.

We could sacrifice AbstractSchemaManager::migrateSchema() (which is just a convenience methods) and separate the APIs cleanly, but I don't know what other side effects it would have.

The `SchemaProvider` interface

The schema provider provides access to the elements of the underlying database schema in the form of schema objects (e.g. Column) or their collections. With this approach, the way in which a given implementation obtains the information about the underlying schema become an implementation detail.

The `IntrospectingSchemaProvider` class

This is the only out-of-the box SchemaProvider implementation. It is responsible for providing the schema by introspecting the underlying database. Internally, it interacts with a MetadataProvider. It requests metadata from the metadata provider and builds schema objects from them.

Additionally, it's responsible for unifying the shape of the schema introspection results between the platforms that do and do not support schemas. If the underlying platform supports schemas, and the schema name of an object matches the current schema, the schema name will be omitted to match the behavior of the platforms that do not support schemas. For example, on PostgresSQL, a table public.users will be introspected as users.

The `MetadataProvider` interface

MetadataProvider is a lower-level interface compared to SchemaProvider. It provides access to the metadata objects, which are the analogs of the array<string, mixed> arrays used by current schema introspection API. Unlike them, they are well-defined and strictly typed.

Each AbstractPlatform implementation is now responsible for instantiating a metadata provider:

dbal/src/Platforms/AbstractPlatform.php

Lines 2451 to 2454 in 903d441

    
           public function createMetadataProvider(Connection $connection): MetadataProvider 
        
           { 
        
               throw NotSupported::new(__METHOD__); 
        
           }

With the new API, most of the time, the SELECT statement and the usage of the selected columns in the PHP code are located in the same method. For example:

dbal/src/Platforms/MySQL/MySQLMetadataProvider.php

Lines 84 to 90 in d95f0ad

    
                   $sql = <<<'SQL' 
        
                   SELECT SCHEMA_NAME 
        
                   FROM information_schema.SCHEMATA 
        
                   ORDER BY SCHEMA_NAME 
        
                   SQL; 
        
                   foreach ($this->connection->iterateColumn($sql) as $databaseName) {

This allows fetching the result using iterateNumeric() or even iterateColumn() and eliminates the need for array_change_key_case().

Access to the objects scoped to a table (e.g. table columns) is usually provided in the form of two methods:

dbal/src/Schema/Metadata/MetadataProvider.php

Line 62 in 7ea5a70

public function getTableColumnsForAllTables(): iterable;

dbal/src/Schema/Metadata/MetadataProvider.php

Line 80 in 7ea5a70

    
           public function getTableColumnsForTable(?string $schemaName, string $tableName): iterable;

This way, depending on whether the consumer needs to introspect the schema of a single table or of all tables in the database, the corresponding method is invoked. Under the hood, these methods usually share an implementation.

MetadataProvider methods return collections in the form of iterable<T> (usually, a Generator<T>), not array<T>. This is not because we want to optimize memory usage. This is because it's more convenient to iterate the query result usingforeach and yeild rows w/o buffering them in an array. The metadata elements aren't supposed to be accessed by index – only iterated and processed one by one. iterable<T> works just fine for that.

Metadata processors

Metadata processors generalize the logic of converting one or many metadata rows into one or more corresponding schema object (e.g. the metadata about foreign key constraint columns into a list of foreign key constraints). This logic is agnostic of the database platform, so there is one implementation per object type (e.g. ForeignKeyConstraintColumnMetadataProcessor).

The exception is table columns. Unlike other metadata row types which contain scalar values, TableColumnMetadataRow contains a full Column object. The reason is that most of the properties the column carries belong to its data type, not the column itself. Therefore, it is more practical to interpret type-specific properties during introspection within a metadata provider and instantiate a complete Column object right away instead defining an abstraction to represent them.

What else has improved?

All schema introspection queries now consistently use prepared statements.
System tables and schemas that don't need to be exposed to the user are consistently handled in each platform's metadata provider.
The reserved column names for schema introspection queries (AbstractSchemaManager::SCHEMA_NAME_COLUMN and AbstractSchemaManager::TABLE_NAME_COLUMN) will be gone.
The collections of objects returned by the new schema manager methods are list<T>, not array<lower-case-name, T>. This will allow to eliminate the conflicts between objects with the names that differ only in case. This makes the API more consistent with Deprecate using keys of table-scoped object collections as names #7102 and other similar changes.

Known limitations

Unlike the current API, the new one doesn't allow introspection of databases other than the current one. The reasons are:

This is not portable. The definition of a "database" varies significanly per supported platform. The PostgreSQL, SQL Server and SQLite schema managers currently just ignore the $databaseName parameter.
This is not tested.
Schema modification is supported only withing the current database on all platforms.

Deprecations and upgrade path

Methods named as list<Object>s() are being deprecated in favor of the corresponding introspect<Object>s(). The methods that used to return a list of strings as names (e.g. listDatabases()) now return a list of quoted names.
Methods named as getTable<Something>() that accept the table name as a god-knows-how formatted string, are being deprecated in favor of a set of introspectTable<Something>(By(Unqualified|Qualified)Name)?(). This way, the user can be explicit about the semantics of the string or can pass the name as object.

The exception is introspectTable(string $name). There's no way to provide an upgrade path in the form of introspectTable(OptionallyQualified $name) because this name is already taken. I expect that most users will use the By(Unqualified|Qualified)Name versions of this method. The signature will be modified in 5.0.x.

Modified tests

Most of the calls to the deprecated APIs in the tests have been replaced with the use of the recommended ones. There are a few exceptions where a given deprecated method is still used in some tests:

listTableColumns() is used by a test that covers the deprecated jsonb column option.
listTableIndexes() is used by the tests that expect the PK constraint to be returned as well (also deprecated).
introspectTable() is used by the tests that:
1. Use a variable OptionallyQualifiedName table name. Using one of the recommended methods would require breaking it down into the schema name and unqualified name. This won't be a problem in 5.0.x.
2. Perform assertions using Index::getColumns(), ForeignKeyConstraint::getLocalColumns() and other similar methods. With the new API, the methods return names surrounded by quotes, which breaks the test. They all are deprecated.
3. Break for other reasons because the names are introspected as quoted (all failures would be expected, but most of them will be compensated by other changes in 5.0.x, so the end result won't change).

Remaining known issues

Unlike announced in #7030 (comment), the migration to the new API alone won't address the problem of introspecting object names containing special characters (e.g. dots). The reason is that, even though the names will be introspected as quoted, AbstractAsset::_setName() will ignore that and parse the dot as a special character:

dbal/src/Schema/AbstractAsset.php

Lines 128 to 129 in aa6b906

    
           if (str_contains($name, '.')) { 
        
               $parts            = explode('.', $name);

This issue should be already addressed in 5.0.x for most if not all the object types.

Future plans

Get rid of @internal platform-specific introspection-specific methods in platform classes:

dbal/src/Platforms/AbstractMySQLPlatform.php

Line 898 in 1041f02

public function fetchTableOptionsByTable(bool $includeTableName): string

dbal/src/Platforms/PostgreSQLPlatform.php

Line 704 in cfa4a04

public function getDefaultColumnValueSQLSnippet(): string

The query used to introspect the schema should be part of the corresponding metadata provider implementation. In general, supporting platform versions by extending classes is not the best approach (I can elaborate on that separately).
Introduce an interface for getDoctrineTypeMapping(). This way, combined with the previous step, we can eliminate the dependency of MetadataProvider implementations on AbstractPlatform.
Introduce a better schema filtering API. The current API is error-prone: it filters tables by a vaguely formatted string (<schema-name>.<table-name>), so if the input contains a dot, it cannot tell whether the dot is a separator or part of the schema or table name. Additionally, filtering sequences probably never worked (the entire Sequence object is passed to the filter instead of the name).

# Conflicts: # UPGRADE.md

greg0ire

I don't know what's most impressive between the patch or its description… great job!

greg0ire · 2025-09-22T06:34:03Z

src/Schema/Metadata/TableColumnMetadataRow.php

+ * Unlike other metadata row types, this one contains a full Column object. The reason is that most of the properties
+ * the column carries belong to its data type, not the column itself. Therefore, it is more practical to interpret
+ * type-specific properties during introspection within a metadata provider and instantiate a complete Column object
+ * right away instead defining an abstraction to represent them.


Suggested change

* right away instead defining an abstraction to represent them.

* right away instead of defining an abstraction to represent them.

greg0ire · 2025-09-22T06:41:56Z

UPGRADE.md

+## Deprecated `AbstractSchemaManager` methods
+
+The following `AbstractSchemaManager` methods have been deprecated:
+1. `listDatabases()` - use `introspectDatabaseNames()` instead.


For methods that appear in the docs, like in this example:

dbal/docs/en/reference/schema-manager.rst

Lines 28 to 36 in 2ce1218

listDatabases()

---------------

Retrieve an array of databases on the configured connection:

.. code-block:: php

<?php

$databases = $sm->listDatabases();

should the corresponding docs be removed, or should a note be added about them being deprecated in favor of another?

Yeah, we can replace them with the recommended ones. I recalled that I need to update the docs like 5 times but still forgot 😓

Speaking of documentation, I learned from a person who professionally writes documentation that wrapping lines in RST/AsciiDoc representing technical documentation is an anti-pattern.

The argument is that, let's say the max line width is 120 characters and you have a paragraph consisting of 1000 characters, so it spans 9 lines. If you add a couple of words in the beginning of the paragraph, it will shift all following words and will require reformatting the entire paragraph and all 9 lines making the diff unreadable.

@greg0ire, @SenseException what do you think about not wrapping lines? On the one hand, it will add quite some horizontal scrolling, but on the other hand, the markdown will not require manual formatting during editing.

I'd say the diff only becomes unreadable if it's a line diff, but at least with GitHub, this will result in a big diff with a smaller diff based on word diff inside. Also, if I open the source for technical docs on a wide display, I don't want to have to read a 1000 character long line. If we are that concerned about reformatting, we should IMO consider https://sembr.org/ instead. But I don't think it is that much of a concern.

Yeah, not a concern. I'm fine with what we have right now.

morozov · 2025-09-23T00:50:24Z

docs/en/reference/schema-manager.rst

-    Parameters containing identifiers passed to the SchemaManager
-    methods are *NOT* quoted automatically! Identifier quoting is
-    really difficult to do manually in a consistent way across
-    different databases. You have to manually quote the identifiers
-    when you accept data from user or other sources not under your
-    control.


This is no longer the case 😎

Now I realize we need a separate article (and maybe an additional blog post) that explains the meaning of quoted/unquoted identifiers (i.e. that both are safe to use), the API design around them, etc. But I'll postpone this until the hackathon.

greg0ire · 2025-09-23T06:13:04Z

docs/en/reference/schema-manager.rst

---------------
-
-Retrieve an array of databases on the configured connection:
+Retrieve a list of the names of available database:


Suggested change

Retrieve a list of the names of available database:

Retrieve a list of the names of available databases:

greg0ire

LGTM after fixing the typo

morozov added Improvement Schema Introspection Deprecation labels Sep 16, 2025

morozov force-pushed the schema-introspection branch 7 times, most recently from a12e62f to d18986a Compare September 19, 2025 01:13

Mark PostgreSQLPlatform::getDefaultColumnValueSQLSnippet() as internal

1ffe4bb

# Conflicts: # UPGRADE.md

morozov force-pushed the schema-introspection branch from d18986a to 75be0ff Compare September 19, 2025 01:36

Introduce equals() in Name implementations

28e5d7f

morozov force-pushed the schema-introspection branch from 75be0ff to 68038b8 Compare September 20, 2025 00:51

morozov marked this pull request as ready for review September 20, 2025 01:10

morozov requested review from derrabus and greg0ire September 20, 2025 01:10

greg0ire previously approved these changes Sep 22, 2025

View reviewed changes

morozov dismissed greg0ire’s stale review via d6dc2b0 September 22, 2025 15:55

morozov added 2 commits September 22, 2025 08:56

Introduce new schema introspection API

9a2121a

Use named arguments with some metadata row constructors

adb8953

morozov force-pushed the schema-introspection branch from d6dc2b0 to 11873bf Compare September 22, 2025 15:56

morozov commented Sep 23, 2025

View reviewed changes

greg0ire reviewed Sep 23, 2025

View reviewed changes

greg0ire previously approved these changes Sep 23, 2025

View reviewed changes

Update schema manager documentation

75f2cb1

morozov dismissed greg0ire’s stale review via 75f2cb1 September 23, 2025 12:50

morozov force-pushed the schema-introspection branch from 11873bf to 75f2cb1 Compare September 23, 2025 12:50

morozov merged commit 1eccb57 into doctrine:4.4.x Sep 23, 2025
97 checks passed

morozov deleted the schema-introspection branch September 23, 2025 13:01

This was referenced Sep 23, 2025

Remove deprecated schema introspection API #7160

Merged

Apply asset filtering to schema name retrieval #7164

Open

Remove Schema\Identifier #7168

Merged

derrabus added this to the 4.4.0 milestone Oct 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

Introduce new schema introspection API#7153

Introduce new schema introspection API#7153
morozov merged 5 commits intodoctrine:4.4.xfrom
morozov:schema-introspection

morozov commented Sep 16, 2025 •

edited

Loading

Uh oh!

greg0ire left a comment

Uh oh!

greg0ire Sep 22, 2025

Uh oh!

greg0ire Sep 22, 2025

Uh oh!

morozov Sep 22, 2025

Uh oh!

morozov Sep 22, 2025

Uh oh!

greg0ire Sep 22, 2025

Uh oh!

morozov Sep 23, 2025

Uh oh!

morozov Sep 23, 2025

Uh oh!

morozov Sep 23, 2025 •

edited

Loading

Uh oh!

greg0ire Sep 23, 2025

Uh oh!

greg0ire left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	/**
	* Selects definitions of table columns in the specified database. If the table name is specified, narrows down
	* the selection to this table.
	*
	* @throws Exception
	*/
	abstract protected function selectTableColumns(string $databaseName, ?string $tableName = null): Result;

	/**
	* Gets Table Column Definition.
	*
	* @param array<string, mixed> $tableColumn
	*
	* @throws TypesException
	*/
	abstract protected function _getPortableTableColumnDefinition(array $tableColumn): Column;

	protected function _getPortableTableIndexesList(array $rows, string $tableName): array
	{
	$result = [];
	foreach ($rows as $row) {
	$indexName = $keyName = $row['key_name'];
	if ($row['primary']) {
	$keyName = 'primary';
	}

	SELECT
	scm.name AS schema_name,
	tbl.name AS table_name,
	idx.name AS key_name,
	col.name AS column_name,
	~idx.is_unique AS non_unique,
	idx.is_primary_key AS [primary],
	CASE idx.type
	WHEN '1' THEN 'clustered'
	WHEN '2' THEN 'nonclustered'
	ELSE NULL
	END AS flags

	public function createMetadataProvider(Connection $connection): MetadataProvider
	{
	throw NotSupported::new(__METHOD__);
	}

	$sql = <<<'SQL'
	SELECT SCHEMA_NAME
	FROM information_schema.SCHEMATA
	ORDER BY SCHEMA_NAME
	SQL;

	foreach ($this->connection->iterateColumn($sql) as $databaseName) {

	if (str_contains($name, '.')) {
	$parts = explode('.', $name);

	* right away instead defining an abstraction to represent them.
	* right away instead of defining an abstraction to represent them.

	listDatabases()
	---------------

	Retrieve an array of databases on the configured connection:

	.. code-block:: php

	<?php
	$databases = $sm->listDatabases();

	Retrieve a list of the names of available database:
	Retrieve a list of the names of available databases:

Uh oh!

Comments

Conversation

morozov commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problems with the current API

Wrong abstractions

array_change_key_case($row, CASE_LOWER) all over the place

Proposed design

The AbstractSchemaManager class

Rejected ideas

The SchemaProvider interface

The IntrospectingSchemaProvider class

The MetadataProvider interface

Metadata processors

What else has improved?

Known limitations

Deprecations and upgrade path

Modified tests

Remaining known issues

Future plans

Uh oh!

greg0ire left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

morozov Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greg0ire left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

morozov commented Sep 16, 2025 •

edited

Loading

`array_change_key_case($row, CASE_LOWER)` all over the place

The `AbstractSchemaManager` class

The `SchemaProvider` interface

The `IntrospectingSchemaProvider` class

The `MetadataProvider` interface

morozov Sep 23, 2025 •

edited

Loading