Skip to content

ColumnVector: re-enable AVX512_VBMI/AVX512_VBMI2 optimized filter and index#41765

Merged
nickitat merged 5 commits intoClickHouse:masterfrom
guowangy:vectorIndexVBMI-heap-fix
Oct 24, 2022
Merged

ColumnVector: re-enable AVX512_VBMI/AVX512_VBMI2 optimized filter and index#41765
nickitat merged 5 commits intoClickHouse:masterfrom
guowangy:vectorIndexVBMI-heap-fix

Conversation

@guowangy
Copy link
Copy Markdown
Contributor

@guowangy guowangy commented Sep 26, 2022

This PR re-enable AVX512_VBMI optimized index and AVX512_VBMI2 optimized filter.

Also Fixed #41745, fixed #41751. When limit == 0, we should just return. Otherwise, it will meet:

  1. undefined behavior for right shift with 64: __mmask64 last_mask = MASK64 >> (64 - data_size);
  2. heap overflow when loading table from data_pos.

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-not-for-changelog This PR should not be mentioned in the changelog label Sep 26, 2022
@nickitat nickitat added the can be tested Allows running workflows for external contributors label Sep 26, 2022
@alexey-milovidov
Copy link
Copy Markdown
Member

Need to also undo these two PRs in this PR: #41797 #41752

@alexey-milovidov
Copy link
Copy Markdown
Member

@nickitat is on vacation till the end of the week, but he will help next week.

@guowangy sorry for somewhat chaotic reverts :) But we actually need this optimization, it is really great!
Let's make it to production!

@guowangy
Copy link
Copy Markdown
Contributor Author

guowangy commented Oct 8, 2022

@alexey-milovidov
I was also on vacation :) Sorry for late response.
I will continue to work on this PR later.

@guowangy guowangy force-pushed the vectorIndexVBMI-heap-fix branch from 908da56 to e2efb24 Compare October 8, 2022 05:20
@guowangy
Copy link
Copy Markdown
Contributor Author

@alexey-milovidov @nickitat
PR is ready, would you please help to review.
Thanks

@guowangy guowangy changed the title vectorIndexImpl: Fix heap buffer overflow when limit == 0 ColumnVector: re-enable AVX512_VBMI/AVX512_VBMI2 optimized filter and index Oct 20, 2022
}
}
}
);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how much data past the end of the array we could write? say, data_size == 1, we have 15 bytes of padding to the right (because elements have size of 1-byte) and we write 64 bytes at a time. looks like potential segfault on these other 48 bytes, am I missing smth?

Copy link
Copy Markdown
Contributor Author

@guowangy guowangy Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick answer: data access will not across the end of array.

For Container & data, we need to load it into register. If the data_size == 1, we use mask load (L425-427), to make sure we will not read across boundary, only 1 bytes loaded:

        /// one single mask load for table size <= 64
        __mmask64 last_mask = MASK64 >> (64 - data_size);
        __m512i table1 = _mm512_maskz_loadu_epi8(last_mask, data_pos);

For Container & res_data, we need to store into memory. Let's say if limit == 1, we use mask store (L461-467) to make sure not to write across boundary, here only 1 bytes stored to res_data:

        /// tail handling
        if (limit > limit64)
        {
            __mmask64 tail_mask = MASK64 >> (limit64 + 64 - limit);
            __m512i vidx = _mm512_maskz_loadu_epi8(tail_mask, indexes_pos + pos);
            __m512i out = _mm512_permutexvar_epi8(vidx, table1);
            _mm512_mask_storeu_epi8(res_pos + pos, tail_mask, out);
        }

@nickitat nickitat self-assigned this Oct 20, 2022
@nickitat
Copy link
Copy Markdown
Member

Stateless tests (debug) [2/3] - #42629

@nickitat nickitat merged commit 4e294b9 into ClickHouse:master Oct 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segmentation fault during merge TargetSpecific::AVX512VBMI::vectorIndexImpl: undefined-behavior + heap-buffer-overflow

4 participants