Skip to content

Conversation

@andimarek
Copy link
Member

@andimarek andimarek commented Oct 11, 2022

A new schema diffing

This PR adds a new Schema diffing class providing a comprehensive and detailed capability to compare two GraphQL schemas.

A GraphQL schema diff is expressed as a list of minimal edit operations transforming the old schema into a new one.
This is based on the the graph edit distance definition.

The algorithm implemented here is based on the following two papers (both papers are from the same authors. The first one is newer, but the older one is more detailed in some aspects)

  1. Accelerating Graph Similarity Search via Efficient GED Computation.

  2. Efficient Graph Edit Distance Computation and Verification via Anchor-aware Lower Bound Estimation

The algorithm is a modified version of "AStar-BMao". It is adapted to directed graphs as a GraphQL schema is most naturally represented as directed graph (vs the undirected graphs used in the papers).

The main class is SchemaDiffing returning a list of graph edit operations (insert/delete/change vertex, insert/delete/change edge).

The second main class is EditOperationAnalyzer which analyzes a list of edit operation to produce a more high level list of
schema changes. (This is still wip).

Graph edit distance

Maybe this is obvious, but a GraphQL schema itself is a graph (in the CS sense). While there are different ways how to model a schema s graph for the this application the model looks like this:
schema-graph

See the SchemaGraph class for how it is implemented.

The problem how to get from one schema to another is equivalent to finding the graph edit distance. The graph edit distance is the minimal amount of edit operations to go from one graph to another. Here (and in the referenced papers) we only consider insert/delete/change of vertices and insert/delete/change of edges and every change has the same "value" as the other.

The problem is that the general task of finding the graph edit distance is NP-hard. The papers above present the current state of art in finding a somehow efficient algorithm for the general problem. This algorithm is called AStar-BMao.

The new SchemaDiffing classes implements AStar-BMao with two major modification: first the type of edit operations we allow is restricted, because we have a "typed graph". (See the image above). For example we don't want to change a Scalar vertex into a Object vertex. We do this to speed up the diffing and in order to make the diffing outcome more semantically meaningful. The second change is that the graph is a directed compared to the undirected ones in the paper.

The outcome of the SchemaDiffing is a list of low level graph operations describing how to get from one schema to another. For example:

type Query {
  hello: String
} 

vs

 type Query {
   hello2: String
} 

will result in one change vertex operation, renaming the field from hello to hello2.

Edit operation analyzer

While the low level edit operations are correct, they are not really great for users to understand what actually changed in a schema. The EditOperationAnalyzer class aims to solve that by interpreting the list of graph edit operations in a GraphQL sense and describing the schema changes as a list of Union/Interface/... changes.

On top of that we can add some kind of breaking change detection that will mark certain changes as breaking/questionable etc. This will need to be configurable in some way to accomodate the different API specific needs.

A blog post describing breaking changes, based on this work here, is planned.

TODO

  • Cleanup
  • Finish the edit operation analyzer

@andimarek andimarek changed the title Adding new schema diffing capability (WIP) Adding new schema diffing capability (First step) Nov 29, 2022
@andimarek
Copy link
Member Author

This PR is not ready, but because of the size I am merging it now. All classes are marked internal and not yet ready for usage.

The high level diff analyzer is not completely ready and some polish/cleanup is missing.

@andimarek andimarek merged commit 8e43b08 into master Nov 29, 2022
@andimarek andimarek added the not release related changes which are not released (for example unit tests or docs) label Nov 29, 2022
@andimarek andimarek changed the title Adding new schema diffing capability (First step) Adding new schema diffing capability Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

not release related changes which are not released (for example unit tests or docs)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants