Skip to content

Need a new way of doing diff replication #154

@zerebubuth

Description

@zerebubuth

In #94, we wanted to upgrade to PostgreSQL 9.6, as this is the latest and greatest version. We were unable to, as we set an index on xid, a system column, which is no longer allowed in PG 9.6:

Disallow creation of indexes on system columns, except for OID columns (David Rowley)

Such indexes were never considered supported, and would very possibly misbehave since the system might change the system-column fields of a tuple without updating indexes. However, previously there were no error checks to prevent them from being created.

Although they've been stable and worked for us for many years, it seems they might not be a great idea.

We should think through how to do this properly. Other systems for user-space replication (i.e: not PostgreSQL-to-PostgreSQL WAL-level replication) seem to use TRIGGERs, but there might be other options. Ideally, what we'd want is:

  1. Idempotent. This means that we can get some reference (which we can store to disk) to the state "now", and diff that against a previous state and get the set of changed elements between these two states.
  2. Immutable. Once written, a state cannot be amended to include new elements. This is the problem with timestamps - we have no way to ensure that all the changes before a particular timestamp are visible to the replicator's transaction.
  3. In a commit order. Commits in PostgreSQL happen in some order which is not necessarily strict linear order but which allows for multiple orders, each of which satisfies relational integrity at the point of commit. In order that 3rd parties can apply diffs without needing to reorder them, or relax constraints, we need to be able to extract changes in one of the many orders which satisfy that.

Additional items that we may want to take the opportunity to include:

  • Changeset replication. This is currently handled by a different tool, but would be nicer if all this stuff was in the same place.
  • Redactions. Currently these aren't included in diff feeds, but this would be an opportunity to add them.
  • Distributed operation. State files are quite small, and could be stored in Zookeeper, etcd or something similar. This would (in combination with the Idempotent/Immutable properties) mean that replication could be robustly run across multiple sites.

How do we do replication efficiently, retaining all the properties above, without an index on xid?

Metadata

Metadata

Assignees

No one assigned

    Labels

    service:apiissues related to the API

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions