Vectorization related tasks

This page is a TODO list for tasks related to the GCC vectorizer.

"vectorizer" meta bug
Replace greedy loop SLP discovery with one based on merging nodes starting from single-lane SLP graph matching the SSA graph
- Do single-lane SLP build when analyzing stmts to be vectorized
Delay vector type assignment to SLP node analysis (vectorizable_*), compute set of vector types and decide on the vector size by evaluating different sets of working combinations
Complete load/store permutation lowering for loop vectorization
Make the vectorization factor support fractional poly-ints to implement re-rolling of loops
Remove if-conversion, replacing it with masking or on-the-fly if-conversion
Generate code directly from SLP instead of copying the scalar loop and replacing stmts
Move pattern detection from stmts to SLP
Make patterns cancelable
Make x86 gather and scatter use the internal function instead of the builtins representation
Split more vectorizable_* into analysis and code generation, store analysis data instead of recomputing it
Code generate unvectorizable (single-lane) SLP instances by duplicating the scalar code implementing partial loop vectorization, with no vectorization this implements unrolling + interleaving (plus costing)

Specific PRs

Old content below

Here is the summary of the Loop-Optimizations BOF that took place at the 2007 GCC summit.

Todo:

missed-optimization PRs in Bugzilla
SLP group size relaxation: vectorize only a subset of interleaved stores or split large groups in subgroups if necessary (PR 49955).
Support minimum/maximum location pattern (PR 31067, 50374).
Enabling the cost model by default (currently enabled only on x86).
Interleaved stores with gaps: support interleaved stores to non contiguous memory locations (i.e. with gaps). Related PRs: PR18438, PR19049.
Interleaving improvements: extend interleaving support to more forms of strided accesses (e.g. non power-of-2 strides).
Support certain operations on data-types that are not directly supported by a target, but yet vectorization is possible. For example, support data movements and bitwise operations on 64-bit data types for altivec). (TODO: check if this is still needed).
Vectorize instructions that operate on a sequence of bytes in memory, which means that they implement semantics that corresponds to code containing a loop in C (such as those available in S390).
Improve debug information (mostly line-number information) for code created by the vectorizer (see http://gcc.gnu.org/ml/gcc-patches/2005-02/msg00197.html). (TODO: check if this is still needed).
Reuse generic loop peeling utilities in the vectorizer where possible (see http://gcc.gnu.org/ml/gcc-patches/2005-02/msg00165.html).
Data Dependence enhancements:
- PR32378 ("siv not implemented")
- ignore forward dependencies
- interchange stmts to reverse backward dependences (PR32806).
Loop-number-of-iterations enhancements:
- make gimplifier create COND_EXPR (Zdenek has an initial patch).
look into vectorizing Fortran COMMON block arrays better.
look into altivec specific problems (PR32107).
Loop-aware SLP:
- Non-isomorphic computations: the current implementation does not address the case in which the GS is greater than VS and not all the elements of the group are defined by isomorphic computations, but there exists a subgroup of VS elements that are defined by isomorphic computations. Now we attempt to construct the SLP-tree from the entire group, and will therefore fail and terminate. However, the analysis can continue if the implementation is extended to explore subgroups of size VS of the SLP group under consideration.
- Allow shifts with different scalar arguments, when the statements that are grouped into the same vector statement have the same argument.