ARROW-9131: [C++] Faster ascii_lower and ascii_upper. by maartenbreddels · Pull Request #7434 · apache/arrow

maartenbreddels · 2020-06-15T09:29:01Z

Following up on #7418 I tried and benchmarked a different way for

ascii_lower
ascii_upper

Before (lower is similar):

--------------------------------------------------
Benchmark           Time           CPU Iterations
--------------------------------------------------
AsciiUpper_median    4922843 ns      4918961 ns           10 bytes_per_second=3.1457G/s items_per_second=213.17M/s

After:

--------------------------------------------------
Benchmark           Time           CPU Iterations
--------------------------------------------------
AsciiUpper_median    1391272 ns      1390014 ns           10 bytes_per_second=11.132G/s items_per_second=754.363M/s

This is a 3.7x speedup (on a AMD machine).

Using http://quick-bench.com/JaDErmVCY23Z1tu6YZns_KBt0qU I found 4.6x speedup for clang 9, 6.4x for GCC 9.2.

Also, the test is expanded a bit to include a non-ascii codepoint, to make explicit it is fine to upper
or lower case a utf8 string. The non-overlap encoding of utf8 make this ok (see section 2.5 of Unicode
Standard Core Specification v13.0).

Following up on apache#7418 I tried and benchmarked a different way for * ascii_lower * ascii_upper Before (lower is similar): ``` -------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------- AsciiUpper_median 4922843 ns 4918961 ns 10 bytes_per_second=3.1457G/s items_per_second=213.17M/s ``` After: ``` -------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------- AsciiUpper_median 1391272 ns 1390014 ns 10 bytes_per_second=11.132G/s items_per_second=754.363M/s ``` This is a 3.7x speedup (on a AMD machine). Using http://quick-bench.com/JaDErmVCY23Z1tu6YZns_KBt0qU I found 4.6x speedup for clang 9, 6.4x for GCC 9.2. Also, the test is expanded a bit to include a non-ascii codepoint, to make explicit it is fine to upper or lower case a utf8 string. The non-overlap encoding of utf8 make this ok (see section 2.5 of Unicode Standard Core Specification v13.0).

github-actions · 2020-06-15T09:32:11Z

https://issues.apache.org/jira/browse/ARROW-9131

pitrou

This is neat. It appears that both gcc and clang auto-vectorize the conditional expression.

pitrou · 2020-06-15T10:59:11Z

cpp/src/arrow/compute/kernels/scalar_string.cc

+    const uint8_t utf8_code_unit = *input++;
+    // Code units in the range [a-z] can only be an encoding of an ascii
+    // character/codepoint, not the 2nd, 3rd or 4th code unit (byte) of an different
+    // codepoint. This guaranteed by non-overal design of the unicode standard. (see


"non-overlap"

maartenbreddels · 2020-06-15T11:37:24Z

I also thought that we could do a bit check instead of the range check, e.g. code_unit & 0b11100000) == 0b01100000, but that would also transform the backtick for instance (binary value 0b1100000).

The generated code looks vectorized indeed. I didn't look into the details of the generated code by clang and GCC, it seems their performance is a bit different, so we might be able to squeeze out a bit more if we want. Happy to look into that later (create a new issue), but I rather spend my time on other functions now.

pitrou · 2020-06-15T11:58:30Z

It's ok, there's no need to further optimize those functions.

pitrou

+1

pitrou · 2020-06-15T12:30:33Z

Travis-CI failure is unrelated.

pitrou · 2020-06-15T12:32:10Z

Thank you @maartenbreddels !

maartenbreddels · 2020-06-15T13:41:15Z

Thanks, let me know if my workflow is ok, or if I can make things go smoother.

PS: I am looking for a document describing the kernel design. I see these two cases (if (batch[0].kind() == Datum::ARRAY) { and the else clause, but I am not sure I fully understand this. But I'm not sure where this is described, if it is.

PS2: Are there alternative channels for quick/small questions, or is this fine?

pitrou · 2020-06-15T13:46:10Z

I think the current approach (implement kernels one-by-one) is reasonable and manageable for us (and for you as well I hope).

I don't think there's much documentation for now around the kernel design. Basically, kernels should usually be able to process two kinds of inputs (represented as Datums): arrays and scalars. That said, arrays are the dominant case, so if we leave scalars unimplemented in a given kernel, that's not an urgent problem.

For quick development questions, we have a public chat instance at https://ursalabs.zulipchat.com/ - just register though and you can chat with the team. The main channel there is the "dev" channel, and Zulip allows you to create subtopics - don't hesitate to use those!

maartenbreddels · 2020-06-15T13:48:40Z

I think the current approach (implement kernels one-by-one) is reasonable and manageable for us (and for you as well I hope).

No, this is fine.

That said, arrays are the dominant case, so if we leave scalars unimplemented in a given kernel, that's not an urgent problem.

👍

wesm · 2020-06-15T13:54:54Z

PS2: Are there alternative channels for quick/small questions, or is this fine?

Can we use [email protected] so everyone can see the discussion and it's searchable via Google later?

PS: I am looking for a document describing the kernel design.

Keep in mind that the new kernels framework is only a few weeks old (7ad49ee), so developer documentation is a bit behind, so don't hesitate to ask questions.

maartenbreddels · 2020-06-15T14:00:31Z

Can we use [email protected] so everyone can see the discussion and it's searchable via Google later?

I'm ok with that if not considered too noisy (like small cmake questions etc).

wesm · 2020-06-15T14:10:06Z

It shouldn't be a problem.

maartenbreddels added 3 commits June 15, 2020 11:43

reword using proper nomenclature

b3b4b56

lint/format

e26e5e0

format/linting test

f51e86f

pitrou reviewed Jun 15, 2020

View reviewed changes

fix typo

6309da7

pitrou approved these changes Jun 15, 2020

View reviewed changes

pitrou closed this in d98b9c5 Jun 15, 2020

maartenbreddels deleted the ARROW-9131 branch June 15, 2020 13:32

asfimport mentioned this pull request Jul 15, 2020

[C++] Faster ascii_lower and ascii_upper #25242

Closed

Conversation

maartenbreddels commented Jun 15, 2020

Uh oh!

github-actions bot commented Jun 15, 2020

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

pitrou Jun 15, 2020

Choose a reason for hiding this comment

Uh oh!

maartenbreddels commented Jun 15, 2020

Uh oh!

pitrou commented Jun 15, 2020

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

pitrou commented Jun 15, 2020

Uh oh!

pitrou commented Jun 15, 2020

Uh oh!

maartenbreddels commented Jun 15, 2020

Uh oh!

pitrou commented Jun 15, 2020

Uh oh!

maartenbreddels commented Jun 15, 2020

Uh oh!

wesm commented Jun 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maartenbreddels commented Jun 15, 2020

Uh oh!

wesm commented Jun 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wesm commented Jun 15, 2020 •

edited

Loading