Skip to content

Parquet partition size issue, how to "recompress" #6427

@mediamana

Description

@mediamana

To reproduce

CREATE TABLE 'test2' ( id SYMBOL, dateTime TIMESTAMP) timestamp(dateTime) PARTITION BY HOUR WAL;

INSERT INTO 'test2' select * from (
SELECT
cast(rnd_int(0, 50000, 0) as string),
rnd_timestamp(
to_timestamp('2025-11-11T00:00:00', 'yyyy-mm-ddTHH:mm:ss'),
to_timestamp('2025-11-11T23:59:59', 'yyyy-mm-ddTHH:mm:ss'),
0
)
FROM long_sequence(10000000)
);

alter table test2 convert partition to parquet where true;

INSERT INTO 'test2' select * from (
SELECT
cast(rnd_int(0, 50000, 0) as string),
rnd_timestamp(
to_timestamp('2025-11-11T00:00:00', 'yyyy-mm-ddTHH:mm:ss'),
to_timestamp('2025-11-11T23:59:59', 'yyyy-mm-ddTHH:mm:ss'),
0
)
FROM long_sequence(10000000)
);

At this point partitions are bigger that it could.
In my example 21.8MiB per partition.

If I I convert them to native then parquet again :

alter table test2 convert partition to native where true;
alter table test2 convert partition to parquet where true;

They go down to 4.4MiB per partition.

QuestDB version:

9.2.0

OS, in case of Docker specify Docker and the Host OS:

Debian

File System, in case of Docker specify Host File System:

ext4

Full Name:

Maximilien Wiktorowski

Affiliation:

Echoes

Have you followed Linux, MacOs kernel configuration steps to increase Maximum open files and Maximum virtual memory areas limit?

  • Yes, I have

Additional context

Is there a way to avoid the intermediate native conversion and just "recompress" the parquet partition.
Because on my real usecase the parquet partitions are 3GiB. Then converting them to native expand them to 15GiB before going back to 700MiB.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions