Override QueryIter::fold to port Query::for_each perf gains to select Iterator combinators#6773
Conversation
|
Oh, man, this hadn't happened yet? It came up on Discord back in Feb 2021; I guess I should have opened an issue. As an FYI, there's a chance that implementing |
|
Finally was able to address the performance issues, so I'm taking this out of draft. Benchmark results: Details |
alice-i-cecile
left a comment
There was a problem hiding this comment.
Great safety comments. I think the unsafe is absolutely worth it to close this gap. Agreed on the deprecation.
|
@InBetweenNames, I'd appreciate your review on this :) |
joseph-gio
left a comment
There was a problem hiding this comment.
Very nice work, it's good to see this get improved upon. I especially like how the folding behavior was split into two separate functions for fold_table and fold_archetype -- it feels much cleaner this way.
Co-authored-by: JoJoJet <[email protected]>
cart
left a comment
There was a problem hiding this comment.
I'm on board for this change and the code looks good to me. Finally deprecating for_each!
|
Yeah this does make sense to me. The #6161 PR can be adapted to use the new style. |
Co-authored-by: Joseph <[email protected]>
|
@james7132, unless I hear otherwise, I'm going to merge this tomorrow :) |
|
As a final sanity check, the codegen of this PR seems to show no tangible difference from what is in |
|
Changelog needs updating btw. Also, this is the first I'm hearing of it, but using for_each() instead for |
|
@james7132 sorry, you lost the coin flip on the merge conflicts :( Merge it yourself when you're done <3 |
|
@JMS55 Roughly, But it's easy to come up with examples where the Query iterator is non-trivial and the work to be done is simple, and thus using |
3db7137 to
541f4c5
Compare
Objective
After #6547,
Query::for_eachhas been capable of automatic vectorization on certain queries, which is seeing a notable (>50% CPU time improvements) for iteration. However,Query::for_eachisn't idiomatic Rust, and lacks the flexibility of iterator combinators.Ideally,
Query::iterand friends should be able to achieve the same results. However, this does seem to blocked upstream (rust-lang/rust#104914) by Rust's loop optimizations.Solution
This is an intermediate solution and refactor. This moves the
Query::for_eachimplementation onto theIterator::foldimplementation forQueryIterinstead. This should result in the same automatic vectorization optimization on allIteratorfunctions that internally use fold, includingIterator::for_each,Iterator::count, etc.With this, it should close the gap between the two completely. Internally, this PR changes
Query::for_eachto usequery.iter().for_each(..)instead of the duplicated implementation.Separately, the duplicate implementations of internal iteration (i.e.
Query::par_for_each) now use portions of the currentQuery::for_eachimplementation factored out into their own functions.This also massively cleans up our internal fragmentation of internal iteration options, deduplicating the iteration code used in
for_eachandpar_iter().for_each().Changelog
Changed:
Query::for_each,Query::for_each_mut,Query::for_each, andQuery::for_each_muthave been moved toQueryIter'sIterator::for_eachimplementation, and still retains their performance improvements over normal iteration. These APIs are deprecated in 0.13 and will be removed in 0.14.