ColumnVector: re-enable AVX512_VBMI/AVX512_VBMI2 optimized filter and index#41765
Conversation
|
@alexey-milovidov |
908da56 to
e2efb24
Compare
|
@alexey-milovidov @nickitat |
| } | ||
| } | ||
| } | ||
| ); |
There was a problem hiding this comment.
how much data past the end of the array we could write? say, data_size == 1, we have 15 bytes of padding to the right (because elements have size of 1-byte) and we write 64 bytes at a time. looks like potential segfault on these other 48 bytes, am I missing smth?
There was a problem hiding this comment.
Quick answer: data access will not across the end of array.
For Container & data, we need to load it into register. If the data_size == 1, we use mask load (L425-427), to make sure we will not read across boundary, only 1 bytes loaded:
/// one single mask load for table size <= 64
__mmask64 last_mask = MASK64 >> (64 - data_size);
__m512i table1 = _mm512_maskz_loadu_epi8(last_mask, data_pos);
For Container & res_data, we need to store into memory. Let's say if limit == 1, we use mask store (L461-467) to make sure not to write across boundary, here only 1 bytes stored to res_data:
/// tail handling
if (limit > limit64)
{
__mmask64 tail_mask = MASK64 >> (limit64 + 64 - limit);
__m512i vidx = _mm512_maskz_loadu_epi8(tail_mask, indexes_pos + pos);
__m512i out = _mm512_permutexvar_epi8(vidx, table1);
_mm512_mask_storeu_epi8(res_pos + pos, tail_mask, out);
}
|
|
This PR re-enable AVX512_VBMI optimized index and AVX512_VBMI2 optimized filter.
Also Fixed #41745, fixed #41751. When
limit == 0, we should just return. Otherwise, it will meet:__mmask64 last_mask = MASK64 >> (64 - data_size);data_pos.Changelog category (leave one):