Skip to content

ArthurHeitmann/zst_blocks_format

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zst Blocks format

The zst blocks format is a simple format for storing rows of data in a compressed format. Compression is done using zstandard (zstd).

If you're just interested in using the CLI or the python scripts, see here.

Format

A file is made up of a series of blocks. Each block is independent and can be decoded without any information from other blocks.

struct ZstCompressedBlock {
	uint32 compressed_size;
	byte compressed_data[compressed_size];
};

The decompressed data has the following format:

struct ZstBlock {
	uint32 row_count;
	ZstRowInfo row_infos[row_count];
	byte rows_data[];
};

struct ZstRowInfo {
	uint32 offset;
	uint32 size;
};

The offset is relative to the start of the rows_data field.

All fields are little endian.

Comparison to plain zstd

Pros

  • Can decode individual rows without decoding the entire file. For individual reads this can be much faster than decoding the entire file.
  • New data can be appended to the end of the file without having to decode or re-encode the entire file.
  • More memory efficient, since the entire file does not need to be decoded into memory.

Cons

  • Slightly larger file size than plain zstd. This is because each block has a header and the data is not compressed as well as a single zstd stream.
  • Slower to decode the entire file. This is because each block has to be decoded individually.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published