Skip to content

Better handling of values too large for VarInt encoding#48628

Merged
rschu1ze merged 2 commits intomasterfrom
rs/var_uint_max
Apr 13, 2023
Merged

Better handling of values too large for VarInt encoding#48628
rschu1ze merged 2 commits intomasterfrom
rs/var_uint_max

Conversation

@rschu1ze
Copy link
Copy Markdown
Member

@rschu1ze rschu1ze commented Apr 11, 2023

PR #48154 introduced a sanity check in the form of a debug assertion that the input values for VarInt encoding are not too big. Such values should be exceptionally rare in practice but the AST fuzzer managed to trigger the assertion regardless.

The strategy to deal with such values until now was to bypass the check by limiting the value to the maximum allowed value (see #48412). Because a new AST Fuzzer failure appeared (#48497) and there may be more failures in future, this PR changes the sanity check from an assert to an exception which the AST Fuzzer ignores.

@azat

Fixes: #48497

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

PR #48154 introduced a sanity check in the form of a debug assertion
that the input values for VarInt encoding are not too big. Such values
should be exceptionally rare in practice but the AST fuzzer managed to
trigger the assertion regardless. The strategy to deal with such values
until now was to bypass the check by limiting the value to the maximum
allowed value (see #48412). Because a new AST Fuzzer failure appeared
(#48497) and there may be more failures in future, this PR changes the
sanity check from an assert to an exception.

Fixes: #48497
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-not-for-changelog This PR should not be mentioned in the changelog label Apr 11, 2023
@azat
Copy link
Copy Markdown
Member

azat commented Apr 11, 2023

@rschu1ze make sense, but I'm slightly worrying about ignoring them.

Maybe instead better to fix all the occurrences? But in a slightly different manner then before, instead of writeVarUIntOverflow introduce writeVarUIntThrow and use it there.

I guess that throwing from the writeVarUInt itself may slow down serialization, though I haven't checked this.

@rschu1ze
Copy link
Copy Markdown
Member Author

@rschu1ze make sense, but I'm slightly worrying about ignoring them.

Maybe instead better to fix all the occurrences? But in a slightly different manner then before, instead of writeVarUIntOverflow introduce writeVarUIntThrow and use it there.

I guess that throwing from the writeVarUInt itself may slow down serialization, though I haven't checked this.

I would say no test, not even the stress tests, except AST Fuzzer is able to generate invalid (too big) inputs. Yes, we are currently ignoring invalid inputs - at least in non-Debug builds - but besides that we don't even know which of the many callers are a problem.

I nevertheless like your idea. I'd add two twists: 1. use [[unlikely]] on the check to keep the overhead minimal, 2. make the checking version the default one (it would validate the input independently of the build type) and add "Unchecked" versions which can be used with trusted inputs (e.g. static version numbers, etc.). But allow me to do that separately, this will need a few iterations.

Copy link
Copy Markdown
Member

@pufit pufit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides @azat concerns, LGTM

@rschu1ze rschu1ze merged commit 22c3b10 into master Apr 13, 2023
@rschu1ze rschu1ze deleted the rs/var_uint_max branch April 13, 2023 08:17
rschu1ze added a commit that referenced this pull request Apr 13, 2023
This is follow-up for the discussion in #48628. The sanity check against
VAR_UINT_MAX is now unconditional (independent of the build type). Let's
see how that affects performance ... I'll probably add "Unchecked"
overloads later on (to avoid clutter they are avoided for now)
rschu1ze added a commit to rschu1ze/ClickHouse that referenced this pull request Jul 6, 2023
Resolves: ClickHouse#51486

Until now, it was illegal to encode 64-bit (unsigned) integers with
MSB=1, i.e. values > (1ULL<<63) - 1, as var-int. In more detail, the
var-int code used by ClickHouse server and client spent at most 9 bytes
per value such that 9 * 7 = 63 bits could be encoded. Some 3rd party
clients (e.g. Rust clickhouse-rs) had the same limitation, whereas other
clients understand the full range (Python clickhouse-driver).

PRs ClickHouse#47608 and ClickHouse#48628 added sanity checks as asserts or exceptions
during var-int encoding on the server side. This was considered okay as
such huge integers so far occurred only during testing (usually fuzzing)
but not in practice.

Issue ClickHouse#51486 is a new fuzzing issue where the exception thrown from the
sanity check led to a half-baked progress packet and as a result, a
logical error / server crash.

The only fix which is not another bandaid is to allow the full range in
var-int coding. Clients will have to allow the full range too, a note
will be added to the changelog. (the alternative was to create another
protocol version but as var-int is used all over the place this was
considered infeasible)

Review note: this is the relevant commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logical error: 'x <= VAR_UINT_MAX'

4 participants