Backups

Details of what's proposed here: #8841

Requirements:
- physical backup for almost all cases - backup of binary data without deserialization;
- allow to backup any kind of tables;
- allow to backup subset of partitions of MergeTree tables;
- allow to backup whole databases or all server data;
- support for incremental backups;
- support for tables on remote storage;
- support for tables with symlinks or hardlinks;
- backup and restore of replicated tables;
- backups are performed in streaming fashion, so +100% space is not required;
- support for parallel writing of backup data to destination storage in multiple streams;
- backup process should not consume too much memory while performed;
- incremental backups should be optimized so reading the whole data is not necessarily if checksums can be quickly compared;
- consistent snapshot whenever possible (for MergeTree tables), best effort when not possible;
- support for different storage options via VFS;
- restore with partial replace (single partition, subset of tables);
- backups should be also useful for distribution of public datasets;
- changing of db/table/replica path on restore;
- allow to restore into different kind of database or table in some cases (Atomic database to Ordinary; ReplicatedMergeTree to MergeTree);
- support for backups and restore of a cluster;

Out of scope:
- point in time recovery;
- restore is not necessarily atomic, the client may see partially restored data if the server is not closed for connections.

Implementation proposal:

Backup is represented by a set of blobs and metadata.

Backup mechanism is implemented with the following:
- add a method to iterate over snapshot of the database in a form of entries consisting blobs;
- add an interface to consume these entries and write a backup;
- add a method to read the backup and iterate over its entries;
- add a method of catalog/database/table to consume backup entries and restore the data;

More details:

IStorage, IDatabase and DatabaseCatalog should provide `backup` method, returning IBakupIterator.
IBakupIterator acquires (consistent if possible but not necessarily) snapshot of table or database or catalog and provides a method to iterate through it.
Iteration over backup iterator returns `BackupEntry` struct that contains:
- virtual path: it's relative path if data is stored in local filesystem or something similar otherwise - it's enough to identify this entry;
- optional ReadBuffer to read the data (blob); it also should hold the corresponding data, so we can read it;
- optional checksum - if already available or empty if it must be calculated from file data (ready available checksums are useful when we do incremental backups to skip reading);
- information about file metadata needed for tables - if it's symlink or hardlink, etc.

Backups can be written to various destinations with `IBackupWriter`. IBackupWriter will consume `BackupEntry` objects to write blobs somewhere and build metadata info. It's also responsible to skip blobs for incremental backups.

Backups can be read back in a form of `IBackupIterator`.
IStorage, IDatabase and DatabaseCatalog should provide `restore` method taking `IBackupIterator` and various options (partial restore, restore with different names...)

Examples:

> physical backup for almost all cases - backup of binary data without deserialization
> backup any kind of tables

The table may provide a ReadBuffer to read some file directly (MergeTree, Log tables) or to generate some stream of serialized data (Memory table).
The only requirement for this stream (blob) is to allow data restore. It's even possible that some table may provide logical dump as a blob, but it's not intended.

Physical backup allows to restore the same set of data parts. Data parts don't have to be re-merged after restore, in contrast to table dump.

Tables that don't store data should not provide any blobs - for example, backup of a table with URL or S3 engine will be shallow.
In contrast, if MergeTree table is stored on S3 (in development), backup will consist blobs of data equal if they are stored locally.

The process of physical backup is similar to simply copying the files... but is more generic to account for all the details.

There are different variants possible if we want to backup Distributed tables, but for first implementation it will be shallow backup.

> allow to backup subset of partitions of MergeTree tables;
> allow to backup whole databases or all server data;

The BACKUP query should allow these options.

When we backup tables, the metadata (.sql) files will be enumerated by BackupInterator of Database.
BackupIterator of Database will also iterate over BackupIterators for tables.

> support for incremental backups

BACKUP query will allow to specify previous backup (its location and total checksum).
BackupWriter will write information about previous backup to metadata of current backup, check the contents of previous backup and don't write blobs that are identical.

> support for tables on remote storage

Tables on remote storage will simply read remote data to provide blobs for backup.
It's appreciated if the contents of a backup for MergeTree table will be independent of the used VFS, and this table can be restored as a table on another VFS.

> support for tables with symlinks or hardlinks

Just write info in backup entry and skip writing blobs.

> backup and restore of replicated tables

Simply choose arbitrary replica for backup. When restore, use commands that will replicate and replace restored data.

> backups are performed in streaming fashion, so +100% space is not required
> backup process should not consume too much memory while performed

BackupIterator only acquires snapshot of data. Blobs are enumerated and read in streaming fashion.

> support for parallel writing of backup data to destination storage in multiple streams

It's Ok to get multiple entries from BackupIterator and read/write blobs in parallel. So, previous entry is not invalidated on iteration.

> incremental backups should be optimized so reading the whole data is not necessarily if checksums can be quickly compared.

We have to store checksums of blobs in backup. When we do incremental backup, we have to compare checksums.
For MergeTree tables we already have checksums in most cases and we can compare them without reading blobs.

> consistent snapshot whenever possible (for MergeTree tables), best effort when not possible

When backup multiple tables, it's not currently possible to ensure that their state correspond to the single point of time.
Backup will be partially consistent for now, but improvements will be possible.

> support for different storage options via VFS

BACKUP query should allow to provide path on filesystem along with disk for VFS or URL-like interface.

> backups should be also useful for distribution of public datasets

Use case: quickly obtain famous datasets from public s3 URL for testing.
Backup will be restored as physically the same set of partitions - it's good for performance testing.

> restore with partial replace (single partition, subset of tables);
> changing of db/table/replica path on restore;

RESTORE query should have these options.

> allow to restore into different kind of database or table in some cases (Atomic database to Ordinary; ReplicatedMergeTree to MergeTree);

Backup of Atomic database may mimic the structure of Ordinary database. So, the backups will look identical and can be restored into different kind of database.
It's also important to allow to restore e.g. ReplicatedMergeTree over S3 into MergeTree on local filesystem.

> support for backups and restore of a cluster;

It should be out of scope in first implementation but it should be possible by:
- iterating over all servers of a cluster;
- processing only metadata for Distributed tables;
- putting info about shard/replica number into backup.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backups #13953

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Backups #13953

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions