Skip to content

Do not check bit flips for big buffers (since the size can be corrupted)#18852

Merged
alexey-milovidov merged 2 commits intoClickHouse:masterfrom
azat:bit-flip-check
Jan 8, 2021
Merged

Do not check bit flips for big buffers (since the size can be corrupted)#18852
alexey-milovidov merged 2 commits intoClickHouse:masterfrom
azat:bit-flip-check

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Jan 7, 2021

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@robot-clickhouse robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label Jan 7, 2021
for (size_t bit_pos = 0; bit_pos < size * 8; ++bit_pos)
/// If size is too huge, then this may be caused by corruption.
/// And anyway this is pretty heavy, so avoid burning too much CPU here.
if (size < 1<<20)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should take about a hour for one megabyte.

Compressed block of size 1 MiB is not huge but normal (for 1 MiB of uncompressable data).

Did you encounter this issue in practice?
Maybe also add a check that compressed size is not too bigger (more than 2 times) than uncompressed size (and do this check also in compression)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should take about a hour for one megabyte.

Something like this (approx)?

WITH
    1024 * 1024 AS size,
    8 * size AS bits,
    ((5.3 * 1024) * 1024) * 1024 AS speed_of_cityhash64
SELECT ((bits * size) / (speed_of_cityhash64 / 2)) / 3600

┌─divide(divide(multiply(bits, size), divide(speed_of_cityhash64, 2)), 60)─┐
│                                                       0.8050314465408805 │
└──────────────────────────────────────────────────────────────────────────┘

Corrupting file to exactly one megabyte is hard but possible.
1mb was chosen because it matches with the max_compress_block_size.
But we can use something like min_compress_block_size*2, it will take ~1min

Did you encounter this issue in practice?

Yes, in the #18853 (there is a note that it is likely will require this PR)
And you can see that the test failed with timeout, while w/ this patch they will pass

Maybe also add a check that compressed size is not too bigger (more than 2 times) than uncompressed size (and do this check also in compression)?

Hm interesting, yes this should be done (although 2x maybe too much, also I guess it need some padding), and also it should check that those size are not exceeded file size but this will require some refactoring.

@alexey-milovidov alexey-milovidov self-assigned this Jan 8, 2021
@alexey-milovidov alexey-milovidov merged commit 9c6d945 into ClickHouse:master Jan 8, 2021
@azat azat deleted the bit-flip-check branch January 8, 2021 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants