-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[thrift-remodel] Add custom PageLocation decoder to speed up decoding of page indexes
#8190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[thrift-remodel] Add custom PageLocation decoder to speed up decoding of page indexes
#8190
Conversation
|
From #8160 benchmark results (old is the original thrift, new is the reworked decoder) With these changes the reading of the page index is now: Note that the decoding of the column index is ~2/3 of the decoding time, so the speed up in the offset index decoding is considerable. Is this worth the added complexity is the question. No changes to the |
PageLocation decoder for to speed up decoding of page indexesPageLocation decoder to speed up decoding of page indexes
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @etseidl -- I think this type of code is acceptable assuming:
- Bad input won't cause undefined behavior (I think it would result in an error)
I also think this kind of special case might be more palatable if we also included the more general decoder as a fallback, something like
let original_input = prot.clone(); // save original location in input -- is this possible?
match try_parse_fast(prot)
Ok(page_location) => Ok(page_location), // successful parse, yay!
Err(_) => {
// ignore and try to parse with slow path
try_parse_slow_but_general(original_input)
}
}| pub first_row_index: i64, | ||
| } | ||
|
|
||
| // Note: this will fail if the fields are either out of order, or if a suboptimal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative might be to have a fallback, so that if we encountered an out of order field, it could discard the work so far, and fall back to the more general, but slower implementation 🤔
| let first_row_index = prot.read_i64()?; | ||
|
|
||
| // This loop slows things down a bit, but it's an acceptible price to allow | ||
| // forwards compatibility. We could instead assert the next field is Stop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise, if we are really focused on speed, we could do a fast path without a loop, and then add a fallback that handled it correctly
It would of course add complexity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pr definitely needs more benchmarking to justify it. On a xeon the speed up isn't as dramatic, so i might shelve this one for now. I do like the idea of keeping both paths, especially with the push decoder on the horizon.
|
I still need to test this more, but I followed your suggestion to fall back on error, and along the way did a custom implementation of all of |
Given the original description, does that mean that the orignal time was 1.75 ms (thus this PR is like a 3X improvement)?
|
The 1.75ms is the time for the decoder currently in main. The first phase of the remodel got that down to 700us, and this PR gets the time down to 585us (so a further 16% improvement). When combined with the column index changes in #8191 this benchmark is down to under 290us. |
|
Going to commit this and move on to writes, etc. We can always revert this later if we don't like it. |
|

Which issue does this PR close?
Note: this targets a feature branch, not main
We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.
Rationale for this change
Add a custom parser for
PageLocationas the decoding of this struct is one of several hot spots.What changes are included in this PR?
This adds a faster means of obtaining the struct field ids to
ThriftCompactInputProtocol. For a small struct (3 fields) with all of them required, we can save a good bit of time bypassingThriftCompactInputProtocol::read_field_beginwhich is very general and can handle out-of-order fields, among other things. By adding a new functionread_field_header, we can avoid the costly branching that occurs when calculating the new field id (as well as special handling needed for boolean fields). Field validation is then handled on the consuming side while decoding thePageLocationstruct.Note that to obtain the speed up seen, we need to assume the fields will always be in order, and the field ids will all be encoded as field deltas. This is probably a fairly safe assumption, but there does exist the possibility of custom thrift writers that use absolute field ids. If we encounter such a writer in the wild, this change will need to be reverted.
Are these changes tested?
These changes should be covered by existing changes.
Are there any user-facing changes?
None beyond the changes in this branch.