-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Parquet partition size issue, how to "recompress" #6427
Description
To reproduce
CREATE TABLE 'test2' ( id SYMBOL, dateTime TIMESTAMP) timestamp(dateTime) PARTITION BY HOUR WAL;
INSERT INTO 'test2' select * from (
SELECT
cast(rnd_int(0, 50000, 0) as string),
rnd_timestamp(
to_timestamp('2025-11-11T00:00:00', 'yyyy-mm-ddTHH:mm:ss'),
to_timestamp('2025-11-11T23:59:59', 'yyyy-mm-ddTHH:mm:ss'),
0
)
FROM long_sequence(10000000)
);
alter table test2 convert partition to parquet where true;
INSERT INTO 'test2' select * from (
SELECT
cast(rnd_int(0, 50000, 0) as string),
rnd_timestamp(
to_timestamp('2025-11-11T00:00:00', 'yyyy-mm-ddTHH:mm:ss'),
to_timestamp('2025-11-11T23:59:59', 'yyyy-mm-ddTHH:mm:ss'),
0
)
FROM long_sequence(10000000)
);
At this point partitions are bigger that it could.
In my example 21.8MiB per partition.
If I I convert them to native then parquet again :
alter table test2 convert partition to native where true;
alter table test2 convert partition to parquet where true;
They go down to 4.4MiB per partition.
QuestDB version:
9.2.0
OS, in case of Docker specify Docker and the Host OS:
Debian
File System, in case of Docker specify Host File System:
ext4
Full Name:
Maximilien Wiktorowski
Affiliation:
Echoes
Have you followed Linux, MacOs kernel configuration steps to increase Maximum open files and Maximum virtual memory areas limit?
- Yes, I have
Additional context
Is there a way to avoid the intermediate native conversion and just "recompress" the parquet partition.
Because on my real usecase the parquet partitions are 3GiB. Then converting them to native expand them to 15GiB before going back to 700MiB.