Skip to content

data corruption on delta+gorilla+lz4 codec pipeline #45195

@mosinnik

Description

@mosinnik

Describe what's wrong

I have dataset from sensors that was corrupted when using codec (Delta, Gorilla, LZ4)

Does it reproduce on recent release?

checked only on

+----------------+----------------------------------------+
|name            |value                                   |
+----------------+----------------------------------------+
|VERSION_FULL    |ClickHouse 22.10.1.1877                 |
|VERSION_DESCRIBE|v22.10.1.1877-testing                   |
|VERSION_INTEGER |22010001                                |
|SYSTEM          |Linux                                   |
|VERSION_GITHASH |98ab5a3c189232ea2a3dddb9d2be7196ae8b3434|
|VERSION_REVISION|54467                                   |
+----------------+----------------------------------------+

Enable crash reporting

no errors in server logs

How to reproduce

create table bug_gor_lz
(
    master_serial String,
    sensor_serial String,
    type          String,
    datetime      DateTime codec (Delta, LZ4),
    value Nullable(Decimal(15, 5)) default NULL,
    value_bug Nullable(Decimal(15, 5)) default NULL codec (Delta, Gorilla, LZ4)
)
    engine = ReplacingMergeTree PARTITION BY toYYYYMM(datetime)
        ORDER BY (master_serial, sensor_serial, type, datetime)
        SETTINGS index_granularity = 8192;

Import from file TabSeparated
I export to file correct values in value column and in value_bug bugged value - check diffs:

select * from bug_gor_lz
where value <> value_bug
limit 10;

For ex currently:

+-------------+-------------+-----+-------------------+-------+---------------------+
|master_serial|sensor_serial|type |datetime           |value  |value_bug            |
+-------------+-------------+-----+-------------------+-------+---------------------+
|70160        |10000175HIT  |press|2022-06-14 06:07:02|0.00000|44483799547993.14956 |
|70160        |10000175HIT  |press|2022-06-14 06:07:03|0.00000|-36626579495765.90833|
|70160        |10000175HIT  |press|2022-06-14 06:07:04|0.00000|66730889016872.82706 |
|70160        |10000175HIT  |press|2022-06-14 06:07:05|0.00000|-14377895735025.95563|
|70160        |10000175HIT  |press|2022-06-14 06:07:06|0.00000|88981167069473.05496 |
|70160        |10000175HIT  |press|2022-06-01 04:32:54|0.00000|0.64536              |
|70160        |10000175HIT  |press|2022-06-01 04:32:55|0.00000|1.29072              |
|70160        |10000175HIT  |press|2022-06-01 04:32:56|0.00000|1.93608              |
|70160        |10000175HIT  |press|2022-06-01 04:32:57|0.00000|2.58144              |
|70160        |10000175HIT  |press|2022-06-01 04:32:58|0.00000|3.22680              |
+-------------+-------------+-----+-------------------+-------+---------------------+

To reproduce create new table with above structure and refill from value to value_bug

Expected behavior

values in value and value_bug should be same

Error message and/or stacktrace

no

Additional context

no

Metadata

Metadata

Assignees

Labels

bugConfirmed user-visible misbehaviour in official releasecomp-codecsCompression/encoding codecs (LZ4/ZSTD/etc.), archive formats (tar/zip/7z), codec selection and IO...minorPriority: minor

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions