-
Notifications
You must be signed in to change notification settings - Fork 11
Description
In #94, we wanted to upgrade to PostgreSQL 9.6, as this is the latest and greatest version. We were unable to, as we set an index on xid, a system column, which is no longer allowed in PG 9.6:
Disallow creation of indexes on system columns, except for
OIDcolumns (David Rowley)Such indexes were never considered supported, and would very possibly misbehave since the system might change the system-column fields of a tuple without updating indexes. However, previously there were no error checks to prevent them from being created.
Although they've been stable and worked for us for many years, it seems they might not be a great idea.
We should think through how to do this properly. Other systems for user-space replication (i.e: not PostgreSQL-to-PostgreSQL WAL-level replication) seem to use TRIGGERs, but there might be other options. Ideally, what we'd want is:
- Idempotent. This means that we can get some reference (which we can store to disk) to the state "now", and diff that against a previous state and get the set of changed elements between these two states.
- Immutable. Once written, a state cannot be amended to include new elements. This is the problem with timestamps - we have no way to ensure that all the changes before a particular timestamp are visible to the replicator's transaction.
- In a commit order. Commits in PostgreSQL happen in some order which is not necessarily strict linear order but which allows for multiple orders, each of which satisfies relational integrity at the point of commit. In order that 3rd parties can apply diffs without needing to reorder them, or relax constraints, we need to be able to extract changes in one of the many orders which satisfy that.
Additional items that we may want to take the opportunity to include:
- Changeset replication. This is currently handled by a different tool, but would be nicer if all this stuff was in the same place.
- Redactions. Currently these aren't included in diff feeds, but this would be an opportunity to add them.
- Distributed operation. State files are quite small, and could be stored in Zookeeper, etcd or something similar. This would (in combination with the Idempotent/Immutable properties) mean that replication could be robustly run across multiple sites.
How do we do replication efficiently, retaining all the properties above, without an index on xid?