Improve MultipartReader performance #51426
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There are 3 "distinct" changes and I'll show benchmark numbers for each of them below, as well as a summary at the end of this comment. Originally, this change was going to be trying to switch our boyer-moore string matching algorithm with IndexOf as described in #49223. But, while testing that out and writing microbenchmarks I noticed some other changes that could significantly improve perf regardless of boyer-moore vs. IndexOf. Switching to IndexOf was a bit more involved than anticipated, so will be a follow-up item since it does show some more perf improvements on top of this PR but will require additional work.
The first commit adds a loop to processing a line where we previously would check if there were items in the buffer on every iteration as well as do a function call to the core loop.
Original perf
First commit perf
The second commit builds on top of the first commit. Since we're now looping over the buffered data, we can wait until the end of the looping to write the span of data to the memorystream instead of writing one byte at a time.
Second commit perf
The third commit adds another scenario to the microbenchmarks, a 10m byte payload. And it improves the performance of reading the section by using a 4k buffer size instead of the default 1 (in specific cases).
Third commit perf
And because it added a new scenario, I reran the benchmarks from the second commit just for the new scenario
Large read perf w/second commit
In summary, the perf changed by 25-70% for the normal scenarios and over 100x in the 10m byte scenario (when application code was reading the section).