Skip to content

Versioning of AggregateFunction states. #12552

@alexey-milovidov

Description

@alexey-milovidov

Use case

Sometimes we have to change serialization format of AggregateFunction states due to bugs or inefficiencies. It should not compromise backward compatibility.

Currently we do it only in exceptional cases:

When this aggregate function is released just recently and rarely used. And we have to put a warning about backward compatibility in changelog.

Sometimes we just miss changes by mistake.

Proposal

Methods IAggregateFunction::serialize, IAggregateFunction::deserialize will take additional argument with version. These methods have to support serialization and deserialization with all known versions.

When user creates a data type AggregateFunction(...), e.g. AggregateFunction(avg, UInt64), we transform it adding parameter with version at front with the most recent version, e.g. AggregateFunction(v1, avg, UInt64).

The user will see the data type with version number in SHOW CREATE TABLE, DESCRIBE TABLE, etc.

When data type AggregateFunction(...) without version is already specified in table definition or in serialization formats (Native), version 0 is assumed implicitly.

When sending data to the client with native protocol, the revision of the client is taken into account. IAggregateFunction should have a method to determine the maximum supported version according to the client revision. The version of AggregateFunction is changed that way. If AggregateFunction data type will have version zero, it is not printed in data type name.

Scenarios

  1. Server sends data to old client. Should work seamless.
  2. Server sends data to old client that is actually another server that initiated distributed query. Should work seamless.
  3. We have a table with columns of AggregateFunction data type stored inside; then upgraded the server and continue to read and write to that table. Should work seamless.
  4. We have a table with columns of AggregateFunction data type stored inside; then upgraded the server and continue to read and write to that table. Then downgraded the server and continued to read and write to that table. Should work seamless.
  5. We have created dump in format TSVWithNamesAndTypes, CSVWithNamesAndTypes, etc. on old server, then trying to upload it to new server. Should work seamless.
  6. We have created dump in format without data types (like TSV, RowBinary) on old server. Then trying to upload it to new server. It may require user to explicitly specify version in data type when creating a table.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions