Skip to content

optimize utf32 validation on icelake#872

Merged
lemire merged 1 commit intomasterfrom
yagiz/optimize-utf32-validation
Dec 6, 2025
Merged

optimize utf32 validation on icelake#872
lemire merged 1 commit intomasterfrom
yagiz/optimize-utf32-validation

Conversation

@anonrig
Copy link
Copy Markdown
Member

@anonrig anonrig commented Dec 5, 2025

Local benchmarks

  | Test Pattern | Haswell (AVX2) | Icelake (Optimized AVX-512) | Speedup |
  |--------------|----------------|-----------------------------|---------|
  | ASCII range  | 69.2 GB/s      | 80.5 GB/s                   | +16%    |
  | Full Unicode | 69.9 GB/s      | 122.0 GB/s                  | +74%    |
  | BMP range    | 70.0 GB/s      | 121.5 GB/s                  | +74%    |
  | Sequential   | 69.3 GB/s      | 121.0 GB/s                  | +75%    |

@anonrig anonrig requested a review from lemire December 5, 2025 02:38
@anonrig anonrig changed the title optimize utf32 validation optimize utf32 validation on icelake Dec 5, 2025

while (buf < end - 16) {
// Optimized: Process 32 values (2x 512-bit) per iteration for better throughput
while (buf + 32 <= end) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has to be written end-buf>=32 to avoid UB. (not introduced in this PR)

}

// Handle remaining 16-31 values
if (buf + 16 <= end) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing here

Copy link
Copy Markdown
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do consider @pauldreik's idea.

The reason it is a bad to do 'mypointer + 32' is that the resulting pointer could be too far.

In C and C++, pointer arithmetic is only defined within the bounds of a single object or array. Specifically: If mypointer points to an element inside an array (or to a standalone object), you are allowed to perform pointer arithmetic as long as the resulting pointer still points into the same array or one past the last element of that array.

(Last paragraph is AI generated.)

I keep messing this up.

@lemire
Copy link
Copy Markdown
Member

lemire commented Dec 6, 2025

I'll merge and fix it.

@lemire lemire merged commit bb74a19 into master Dec 6, 2025
69 of 70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants