-
-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Fix HTTP startline validation #16022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix HTTP startline validation #16022
Conversation
Motivation: The code assumed that oversized bit shifting would result in zeroing out values due to overflow. However, the Java Language Specification instead says that shifts effectively only consider the lower six bits of the shift amount, resulting in modular-arithmetic shifts. The consequence is that, for instance, shifing by the capital letter 'M' produces the same bit mask as carriage-return '\r', which is an illegal character in an HTTP start line. This incorrectly rejected valid URIs. Modification: Make the shifting conditional and only use it on character values less than or equal to 64 (the Long bit size). Also add tests to check that valid URLs are accepted. Result: Fixes netty#16020
codec-http/src/main/java/io/netty/handler/codec/http/HttpUtil.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Jonas Konrad <[email protected]>
|
As for performance, previously we had: And in this PR: 😑 |
|
Yea, not ideal and I'm sure this can still be optimized, it should be a relatively common task... But let's get the fix out first. |
|
|
||
| private static long charMask(CharSequence token, int i) { | ||
| char c = token.charAt(i); | ||
| return c < 64 ? 1L << c : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using s lookup table for this or
long mask = ((c - 64) >> 63);
return mask & (1L << c);
Assuming c to be >= 0.
I made it by phone so could be wrong but I think you got the idea; mask should use the msb to create a full 0s or 1s mask to produce the actual value. In this way there won't be any comparison.
A lookup table could be used similarly by computing an index to lookup into a table with N + 1 elements
| charMask(token, i + 1) | | ||
| charMask(token, i + 2) | | ||
| charMask(token, i + 3); | ||
| if ((chars & ILLEGAL_REQUEST_LINE_TOKEN_OCTET_MASK) != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure batching would pay off here: validations are a strange affair!
If we assume is rare to return false, computing the 4 ors and one and with the illegal request line is still not ideal!
Just writing straight comparison chart per char (no need to or and batch) should perform 4 more ands (which are bad) and 4 comparisons, but will make the loads independent..in short comparison outcomes are processed even before CPU does fetch and decode and keeping the 4 char loads independent would make the pipeline able to keep on speculating and processing more chars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could payoff only If we could use VarHandle and read from a byte[] 4 chars in parallel into a long, saving 3 loads to happen.
Clearly charMask should use the branchless version I explained and be made batchy as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In short, I would try just the simple loop case here (using the branchless code I have provided)
|
@franz1981 I tried your suggestions but I'm not finding them profitable on my system. Batching and using the conditional remains the fastest approach so far. |
|
With the masking from #16022 (comment) With conditional in charMask but no batching: With both suggestions together: |
|
I will check on my Ryzen next week but my guess is that you haven't enough random samples w a distribution of chars below/above 64. If the validation is really stressed, branch mispredict should be rather dominant when happen, in my XP |
|
Indeed HttpRequestEncoderInsertBenchmark is just way too predictable to represent a valid benchmark (if we assume in the real world chars really can be both below/above 64 - which separate numbers from letters, for example) |
|
Let me pull it in as it is for now to fix the regression. We can work on perf improvements as a followup |
Motivation: The code assumed that oversized bit shifting would result in zeroing out values due to overflow. However, the Java Language Specification instead says that shifts effectively only consider the lower six bits of the shift amount, resulting in modular-arithmetic shifts. The consequence is that, for instance, shifing by the capital letter 'M' produces the same bit mask as carriage-return '\r', which is an illegal character in an HTTP start line. This incorrectly rejected valid URIs. Modification: Make the shifting conditional and only use it on character values less than or equal to 64 (the Long bit size). Also add tests to check that valid URLs are accepted. Result: Fixes #16020 --------- Co-authored-by: Jonas Konrad <[email protected]>
Motivation: The code assumed that oversized bit shifting would result in zeroing out values due to overflow. However, the Java Language Specification instead says that shifts effectively only consider the lower six bits of the shift amount, resulting in modular-arithmetic shifts. The consequence is that, for instance, shifing by the capital letter 'M' produces the same bit mask as carriage-return '\r', which is an illegal character in an HTTP start line. This incorrectly rejected valid URIs. Modification: Make the shifting conditional and only use it on character values less than or equal to 64 (the Long bit size). Also add tests to check that valid URLs are accepted. Result: Fixes #16020 --------- Co-authored-by: Jonas Konrad <[email protected]>
Motivation: The code assumed that oversized bit shifting would result in zeroing out values due to overflow. However, the Java Language Specification instead says that shifts effectively only consider the lower six bits of the shift amount, resulting in modular-arithmetic shifts. The consequence is that, for instance, shifing by the capital letter 'M' produces the same bit mask as carriage-return '\r', which is an illegal character in an HTTP start line. This incorrectly rejected valid URIs. Modification: Make the shifting conditional and only use it on character values less than or equal to 64 (the Long bit size). Also add tests to check that valid URLs are accepted. Result: Fixes #16020 --------- Co-authored-by: Jonas Konrad <[email protected]>
Motivation: The code assumed that oversized bit shifting would result in zeroing out values due to overflow. However, the Java Language Specification instead says that shifts effectively only consider the lower six bits of the shift amount, resulting in modular-arithmetic shifts. The consequence is that, for instance, shifing by the capital letter 'M' produces the same bit mask as carriage-return '\r', which is an illegal character in an HTTP start line. This incorrectly rejected valid URIs. Modification: Make the shifting conditional and only use it on character values less than or equal to 64 (the Long bit size). Also add tests to check that valid URLs are accepted. Result: Fixes #16020 --------- Co-authored-by: Jonas Konrad <[email protected]>
Motivation: The code assumed that oversized bit shifting would result in zeroing out values due to overflow. However, the Java Language Specification instead says that shifts effectively only consider the lower six bits of the shift amount, resulting in modular-arithmetic shifts. The consequence is that, for instance, shifing by the capital letter 'M' produces the same bit mask as carriage-return '\r', which is an illegal character in an HTTP start line. This incorrectly rejected valid URIs. Modification: Make the shifting conditional and only use it on character values less than or equal to 64 (the Long bit size). Also add tests to check that valid URLs are accepted. Result: Fixes #16020 --------- Co-authored-by: Jonas Konrad <[email protected]>
Motivation: The code assumed that oversized bit shifting would result in zeroing out values due to overflow. However, the Java Language Specification instead says that shifts effectively only consider the lower six bits of the shift amount, resulting in modular-arithmetic shifts. The consequence is that, for instance, shifing by the capital letter 'M' produces the same bit mask as carriage-return '\r', which is an illegal character in an HTTP start line. This incorrectly rejected valid URIs. Modification: Make the shifting conditional and only use it on character values less than or equal to 64 (the Long bit size). Also add tests to check that valid URLs are accepted. Result: Fixes #16020 Co-authored-by: Chris Vest <[email protected]> Co-authored-by: Jonas Konrad <[email protected]>
Motivation: The code assumed that oversized bit shifting would result in zeroing out values due to overflow. However, the Java Language Specification instead says that shifts effectively only consider the lower six bits of the shift amount, resulting in modular-arithmetic shifts. The consequence is that, for instance, shifing by the capital letter 'M' produces the same bit mask as carriage-return '\r', which is an illegal character in an HTTP start line. This incorrectly rejected valid URIs. Modification: Make the shifting conditional and only use it on character values less than or equal to 64 (the Long bit size). Also add tests to check that valid URLs are accepted. Result: Fixes #16020 Co-authored-by: Chris Vest <[email protected]> Co-authored-by: Jonas Konrad <[email protected]>
Motivation:
The code assumed that oversized bit shifting would result in zeroing out values due to overflow. However, the Java Language Specification instead says that shifts effectively only consider the lower six bits of the shift amount, resulting in modular-arithmetic shifts. The consequence is that, for instance, shifing by the capital letter 'M' produces the same bit mask as carriage-return '\r', which is an illegal character in an HTTP start line. This incorrectly rejected valid URIs.
Modification:
Make the shifting conditional and only use it on character values less than or equal to 64 (the Long bit size). Also add tests to check that valid URLs are accepted.
Result:
Fixes #16020