Minor: Document SIMD rationale and tips#6554
Conversation
arrow/CONTRIBUTING.md
Outdated
| ### Usage if SIMD / Auto vectorization | ||
|
|
||
| This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but | ||
| instead relies on LLVM's auto-vectorization. |
There was a problem hiding this comment.
"... on the compiler's ..." ?
(in fact, vectorization could be applied on Rust MIR level, before LLVM?)
There was a problem hiding this comment.
Ill confess it is a while since i dug into rustc, but I would have thought MIR to be to high level to effectively perform auto-vectorisation which is extremely ISA specific, the best it could do would be to use LLVMs vector types, but general heiristics for doing this would be hard
There was a problem hiding this comment.
I changed the docs to say "the Rust compilers auto-vectorization" as I think that is the high level description of what is going on
In this context, I think the use of llvm is an "implementation detail" (albliet an important one) about how that auto-vectorization is accomplished.
arrow/CONTRIBUTING.md
Outdated
|
|
||
| SIMD intrinsics are difficult to maintain and can be difficult to reason about. | ||
| The auto-vectorizer in LLVM is quite good and often produces better code than | ||
| hand-written manual uses of SIMD. In fact, this crate used to to have a fair |
arrow/CONTRIBUTING.md
Outdated
| The auto-vectorizer in LLVM is quite good and often produces better code than | ||
| hand-written manual uses of SIMD. In fact, this crate used to to have a fair | ||
| amount of manual SIMD, and over time we've removed it as the auto-vectorized | ||
| code was faster. |
There was a problem hiding this comment.
I rephrased the sentence to hopefully be clearer now
"In fact, this crate used to contain several manual SIMD implementations, which were removed after discovering the auto-vectorized code was faster."
arrow/CONTRIBUTING.md
Outdated
| LLVM is relatively good at vectorizing vertical operations provided: | ||
|
|
||
| 1. No conditionals within the loop body | ||
| 2. Not too much inlining , as the vectorizer gives up if the code is too complex |
arrow/CONTRIBUTING.md
Outdated
|
|
||
| 1. No conditionals within the loop body | ||
| 2. Not too much inlining , as the vectorizer gives up if the code is too complex | ||
| 3. No bitwise horizontal reductions or masking |
There was a problem hiding this comment.
is "bitwise horizontal reductions" an obvious term?
There was a problem hiding this comment.
It is a class of SIMD operations, I think if people don't know to what this refers, they probably aren't the audience for this
There was a problem hiding this comment.
Thanks @tustvold , i see your point.
OTOH, SIMD is widely known term and people may come to read this doc out of sheer interest how we think about simdizing the code. The term stands out from the rest of the text as less understood and https://www.google.com/search?q=bitwise+horizontal+reductions doesn't bring an obvious definition.
There was a problem hiding this comment.
There was a problem hiding this comment.
Perhaps we could link to https://rust-lang.github.io/packed_simd/perf-guide/vert-hor-ops.html
TIL: That is a nice description
I reworded this item to
- No [horizontal reductions] or data dependencies
arrow/CONTRIBUTING.md
Outdated
| 1. No conditionals within the loop body | ||
| 2. Not too much inlining , as the vectorizer gives up if the code is too complex | ||
| 3. No bitwise horizontal reductions or masking | ||
| 4. You've enabled SIMD instructions in the target ISA (e.g. `target-cpu` `RUSTFLAGS` flag) |
There was a problem hiding this comment.
Prefer passive voice. "SIMD instructions are enabled in the target ISA"
There was a problem hiding this comment.
Changed to "Suitable SIMD instructions available in the target ISA (e.g. target-cpu RUSTFLAGS flag)"
arrow/CONTRIBUTING.md
Outdated
| support many SIMD instructions. See the Performance Tips section at the | ||
| end of <https://crates.io/crates/arrow> | ||
|
|
||
| To ensure your code is fully vectorized, we recommend getting familiar with |
arrow/CONTRIBUTING.md
Outdated
| end of <https://crates.io/crates/arrow> | ||
|
|
||
| To ensure your code is fully vectorized, we recommend getting familiar with | ||
| tools like <https://rust.godbolt.org/> (again being sure to set `RUSTFLAGS`) and |
There was a problem hiding this comment.
again being sure to set
RUSTFLAGS
requires to set RUSTFLAGS properly
arrow/CONTRIBUTING.md
Outdated
| LLVM is relatively good at vectorizing vertical operations provided: | ||
|
|
||
| 1. No conditionals within the loop body | ||
| 2. Not too much inlining , as the vectorizer gives up if the code is too complex |
There was a problem hiding this comment.
| 2. Not too much inlining , as the vectorizer gives up if the code is too complex | |
| 2. Not too much inlining necessary, as the vectorizer gives up if the code is too complex |
There was a problem hiding this comment.
I think this changes the meaning, which is that over zealous use of inline can break the vectorizer
There was a problem hiding this comment.
Ah ok, the phrasing was not clear to me. Maybe use "inlining hints" then?
There was a problem hiding this comment.
Changed it to be "Not too much #[inline]"
There was a problem hiding this comment.
That also changes the meaning, as we have to use #[inline(never)] in various places to actively stop the compiler from inlining things
There was a problem hiding this comment.
🤔
How about "not too much inlining (judicious use of #[inline] and #[inline(never)] as the vectorizer gives up if the code is too complex)
There was a problem hiding this comment.
I'd move the bracket to "not too much inlining (judicious use of #[inline] and #[inline(never)]) as the vectorizer gives up if the code is too complex" but sounds good to me
|
Starting to incorporate comments |
Co-authored-by: Ed Seidl <[email protected]> Co-authored-by: Piotr Findeisen <[email protected]>
|
I think it is looking pretty good now -- rendered version: https://github.com/alamb/arrow-rs/blob/alamb/simd_docs/arrow/CONTRIBUTING.md |
|
Thanks everyone -- I am happy to update this test further as well as part of some follow on PRs |
Which issue does this PR close?
Closes #.
Rationale for this change
@tustvold wrote up some great tips / rationale on apache/datafusion#12821 (comment) that I thought would be good to add in the docs of this repo
What changes are included in this PR?
Add documentation on the rationale for not using manual SIMD, as well as tips/tricks to get the code to properly vectorize.
See rendered version here: https://github.com/alamb/arrow-rs/blob/alamb/simd_docs/arrow/CONTRIBUTING.md
Are there any user-facing changes?
Just docs