Skip to content

Conversation

@rmuir
Copy link
Member

@rmuir rmuir commented Oct 12, 2023

Following up to @dweiss work, this gives us the same benchmarks as https://github.com/rmuir/vectorbench, just without the code duplication and maintenance hassle.

Each method is simply invoked with and without --add-modules=jdk.incubator.vector, so we can easily compare the performance of both the scalar and vector implementations.

Same results we were getting before, with less code. I also like that it tests the "real" codepath through lucene, we just call VectorUtil methods with different JVM args.

Benchmark                                   (size)   Mode  Cnt   Score   Error   Units
VectorUtilBenchmark.binaryCosineScalar        1024  thrpt    5   0.763 ± 0.012  ops/us
VectorUtilBenchmark.binaryCosineVector        1024  thrpt    5   3.487 ± 0.080  ops/us
VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt    5   1.404 ± 0.032  ops/us
VectorUtilBenchmark.binaryDotProductVector    1024  thrpt    5   7.120 ± 0.133  ops/us
VectorUtilBenchmark.binarySquareScalar        1024  thrpt    5   1.109 ± 0.898  ops/us
VectorUtilBenchmark.binarySquareVector        1024  thrpt    5   5.938 ± 1.490  ops/us
VectorUtilBenchmark.floatCosineScalar         1024  thrpt    5   0.612 ± 0.071  ops/us
VectorUtilBenchmark.floatCosineVector         1024  thrpt    5   5.953 ± 0.118  ops/us
VectorUtilBenchmark.floatDotProductScalar     1024  thrpt    5   1.926 ± 0.060  ops/us
VectorUtilBenchmark.floatDotProductVector     1024  thrpt    5  11.865 ± 1.057  ops/us
VectorUtilBenchmark.floatSquareScalar         1024  thrpt    5   1.386 ± 0.065  ops/us
VectorUtilBenchmark.floatSquareVector         1024  thrpt    5   9.689 ± 0.102  ops/us

@rmuir rmuir requested a review from dweiss October 12, 2023 23:22
@rmuir
Copy link
Member Author

rmuir commented Oct 12, 2023

You can still tell what's happening too due to the log messages. When each benchmark runs, you see a single message:

xxxScalar() methods:

WARNING: Java vector incubator module is not readable. For optimal vector performance, pass '--add-modules jdk.incubator.vector' to enable Vector API.

xxxVector() methods:

INFO: Java vector incubator API enabled; uses preferredBitSize=256

@uschindler
Copy link
Contributor

uschindler commented Oct 13, 2023

I have not yet checked the benchmakr fraework at all (earlier PR), but the idea here is cool:

jvmArgsPrepend = {"--add-modules=jdk.incubator.vector"})

so it runs with different comd line automatically by just the JMH annotation. yipee.!

@ChrisHegarty
Copy link
Contributor

This looks great. I'll checkout the branch and play a little with, then report back.

Copy link
Contributor

@uschindler uschindler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the cleanup of documentation as suggested by Dawid would be good. Otherwise looks fine.

Have not played around, will report on windows with my laptop.

@dweiss is it enough to have RUNTIME_JAVA_HOME set when using Gradle to run benchmark with also preview versions not yet supported by Gradle?

@rmuir
Copy link
Member Author

rmuir commented Oct 13, 2023

I also removed some of the hacky self-tests the benchmarks were doing (comparing results across different impls). The reason is: when experimenting, you can easily validate the correctness with gradlew test now that it sits under the same umbrella. It was a hack due to the cost of copy-paste between two different git repositories and stuff like that. I figure: let's keep the benchmarks simple and focused on benchmarking and stable results.

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! And will really help with future maintenance in this area. Thank you.

@rmuir
Copy link
Member Author

rmuir commented Oct 13, 2023

I updated the docs, please have another look.

@uschindler
Copy link
Contributor

Hi Robert,
could you remove this exclusion:
https://github.com/dweiss/lucene/blob/e2e1b569fba1593ff4c3381f3494f552db2b4c74/lucene/benchmark-jmh/build.gradle#L37-L38

It is no longer needed, as the benchmark classes only call public Lucene methods.

@uschindler
Copy link
Contributor

@dweiss is it enough to have RUNTIME_JAVA_HOME set when using Gradle to run benchmark with also preview versions not yet supported by Gradle?

I noticed you need to run it on your own. We could mabye add some runner task to gradle, like ./gradlew :lucene:benchmark-jmh:run, which runs with the main gradle or alternative JDK as configured in Gradle.

@rmuir
Copy link
Member Author

rmuir commented Oct 13, 2023

Hi Robert, could you remove this exclusion: https://github.com/dweiss/lucene/blob/e2e1b569fba1593ff4c3381f3494f552db2b4c74/lucene/benchmark-jmh/build.gradle#L37-L38

It is no longer needed, as the benchmark classes only call public Lucene methods.

I don't see this in my branch...

@uschindler
Copy link
Contributor

uschindler commented Oct 13, 2023

Hi Robert, could you remove this exclusion: https://github.com/dweiss/lucene/blob/e2e1b569fba1593ff4c3381f3494f552db2b4c74/lucene/benchmark-jmh/build.gradle#L37-L38
It is no longer needed, as the benchmark classes only call public Lucene methods.

I don't see this in my branch...

It is no longer on main, too.

Remove the "failOnMissingClasses = false" a few lines above instead. Dawid changed it later.

@uschindler
Copy link
Contributor

Sorry for the chaotic comments, the failOnMissingClasses = false line can be removed.

I checked it with roberts branch:

> Task :lucene:benchmark-jmh:forbiddenApisMain
Caching disabled for task ':lucene:benchmark-jmh:forbiddenApisMain' because:
  Build cache is disabled
Task ':lucene:benchmark-jmh:forbiddenApisMain' is not up-to-date because:
  Executed with '--rerun-tasks'.
Signature file applied: defaults.all.txt
Signature file applied: defaults.lucene.txt
Reading bundled API signatures: jdk-unsafe-17
Reading bundled API signatures: jdk-deprecated-17
Reading bundled API signatures: jdk-non-portable
Reading bundled API signatures: jdk-reflection
Reading bundled API signatures: jdk-system-out
Reading API signatures: C:\Users\Uwe Schindler\Projects\lucene\lucene\gradle\validation\forbidden-apis\defaults.logging.txt
Reading API signatures: C:\Users\Uwe Schindler\Projects\lucene\lucene\gradle\validation\forbidden-apis\defaults.all.txt
Reading API signatures: C:\Users\Uwe Schindler\Projects\lucene\lucene\gradle\validation\forbidden-apis\defaults.lucene.txt
Loading classes to check...
Scanning classes for violations...
Scanned 3 class file(s) for forbidden API invocations (in 0.23s), 0 error(s).
Resolve mutations for :lucene:benchmark-jmh:compileTestJava (Thread[included builds,5,main]) started.
:lucene:benchmark-jmh:compileTestJava (Thread[included builds,5,main]) started.

@uschindler
Copy link
Contributor

I just committed the change.

@dweiss
Copy link
Contributor

dweiss commented Oct 13, 2023

./gradlew :lucene:benchmark-jmh:run, which runs with the main gradle or alternative JDK as configured in Gradle.

I mentioned this on the original issue - I don't think forking a JMH subprocess from within gradle makes much sense as it may affect the results... and realistically, you'll want to test it out with multiple JVMs in parallel, I don't see much benefit here.

Copy link
Contributor

@uschindler uschindler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a bug! If I run the benchmark it always prints that vector module is used.

The reason for that is that in module descriptor it "requires jdk.incubator.vector". Therefor it enables it automaticlly in module mode (when you run with command line printed after "gradlew :...:assemble").

The jdk.incubator.vector must be removed from the module-info.java file.

@uschindler
Copy link
Contributor

Let me fix...

…ys enables vectors. With that change it works.
@uschindler
Copy link
Contributor

Fixed. If you start the benchamrk now in module mode with the command line provided by "assemble" it works correct:

> Task :lucene:benchmark-jmh:assemble

JMH benchmarks compiled. Run them with:

java -jar lucene\benchmark-jmh\build\benchmarks\lucene-benchmark-jmh-10.0.0-SNAPSHOT.jar

or

java --module-path lucene\benchmark-jmh\build\benchmarks --module org.apache.lucene.benchmark.jmh

JMH options you can use with the above:

  -h      displays verbose help for all options
  -l      list available benchmarks
  -lp     list benchmarks that pass the filter and their parameters
  regexp  execute all benchmark containing regexp

@dweiss
Copy link
Contributor

dweiss commented Oct 13, 2023

is it enough to have RUNTIME_JAVA_HOME set when using Gradle to run benchmark with also preview versions not yet supported by Gradle?

The code will compile with the provided RUNTIME_JAVA_HOME but I intentionally left out the "launcher" - leaving just instructions on the console. If somebody uses JMH, they can point at whatever JVM they wish. I think using gradle to launch these measurement subprocesses is too crude.

@dweiss
Copy link
Contributor

dweiss commented Oct 13, 2023

The jdk.incubator.vector must be removed from the module-info.java file.

This was intentional, otherwise the example benchmark wouldn't compile... But since the vectors-specific code is removed now, I think removing this dependency is ok as well.

@uschindler
Copy link
Contributor

The jdk.incubator.vector must be removed from the module-info.java file.

This was intentional, otherwise the example benchmark wouldn't compile... But since the vectors-specific code is removed now, I think removing this dependency is ok as well.

The problem was: if you run the benchmark in module mode, the requirement added the module without respecting if the extra runtime parameter was given.

I committed the change. It works now, benchmark is running.

@rmuir
Copy link
Member Author

rmuir commented Oct 13, 2023

Yeah, i think its better not to compile benchmarks with incubator/preview etc? I think benchmarks should just call lucene methods, and lucene code should do the right thing with handling incubator/preview?

@uschindler
Copy link
Contributor

I ran all benchmarks in module mode (second line of assemble output) on my AVX-256 laptop:

Prozessor: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz, 1992 MHz, 4 Kern(e), 8 logische(r) Prozessor(en)

C:\Users\Uwe Schindler\Projects\lucene\lucene>"C:\Program Files\Java\jdk-21\bin\java"  --module-path lucene\benchmark-jmh\build\benchmarks --module org.apache.lucene.benchmark.jmh
[...]
# Benchmark: org.apache.lucene.benchmark.jmh.VectorUtilBenchmark.floatSquareScalar
# Parameters: (size = 1024)

# Run progress: 90,63% complete, ETA 00:03:42
# Fork: 1 of 1
# Warmup Iteration   1: Okt. 13, 2023 6:31:40 PM org.apache.lucene.internal.vectorization.VectorizationProvider lookup
WARNUNG: Java vector incubator module is not readable. For optimal vector performance, pass '--add-modules jdk.incubator.vector' to enable Vector API.
[...]
# Benchmark: org.apache.lucene.benchmark.jmh.VectorUtilBenchmark.floatSquareVector
# Parameters: (size = 1024)

# Run progress: 98,96% complete, ETA 00:00:24
# Fork: 1 of 1
WARNING: Using incubator modules: jdk.incubator.vector
# Warmup Iteration   1: Okt. 13, 2023 6:34:58 PM org.apache.lucene.internal.vectorization.PanamaVectorizationProvider <init>
INFORMATION: Java vector incubator API enabled; uses preferredBitSize=256
[...]

Benchmark                                   (size)   Mode  Cnt    Score    Error   Units
VectorUtilBenchmark.binaryCosineScalar           1  thrpt    5  111,963 ± 55,067  ops/us
VectorUtilBenchmark.binaryCosineScalar         128  thrpt    5    6,607 ±  0,530  ops/us
VectorUtilBenchmark.binaryCosineScalar         207  thrpt    5    4,297 ±  0,268  ops/us
VectorUtilBenchmark.binaryCosineScalar         256  thrpt    5    3,591 ±  0,098  ops/us
VectorUtilBenchmark.binaryCosineScalar         300  thrpt    5    2,831 ±  0,761  ops/us
VectorUtilBenchmark.binaryCosineScalar         512  thrpt    5    1,749 ±  0,261  ops/us
VectorUtilBenchmark.binaryCosineScalar         702  thrpt    5    1,272 ±  0,439  ops/us
VectorUtilBenchmark.binaryCosineScalar        1024  thrpt    5    0,846 ±  0,212  ops/us
VectorUtilBenchmark.binaryCosineVector           1  thrpt    5  116,594 ± 19,379  ops/us
VectorUtilBenchmark.binaryCosineVector         128  thrpt    5   23,696 ±  0,971  ops/us
VectorUtilBenchmark.binaryCosineVector         207  thrpt    5   15,562 ±  1,261  ops/us
VectorUtilBenchmark.binaryCosineVector         256  thrpt    5   15,580 ±  0,818  ops/us
VectorUtilBenchmark.binaryCosineVector         300  thrpt    5   10,589 ±  9,402  ops/us
VectorUtilBenchmark.binaryCosineVector         512  thrpt    5    8,864 ±  1,360  ops/us
VectorUtilBenchmark.binaryCosineVector         702  thrpt    5    5,632 ±  0,152  ops/us
VectorUtilBenchmark.binaryCosineVector        1024  thrpt    5    4,033 ±  0,966  ops/us
VectorUtilBenchmark.binaryDotProductScalar       1  thrpt    5  276,530 ± 38,515  ops/us
VectorUtilBenchmark.binaryDotProductScalar     128  thrpt    5   13,190 ±  0,303  ops/us
VectorUtilBenchmark.binaryDotProductScalar     207  thrpt    5    8,590 ±  0,332  ops/us
VectorUtilBenchmark.binaryDotProductScalar     256  thrpt    5    6,982 ±  0,256  ops/us
VectorUtilBenchmark.binaryDotProductScalar     300  thrpt    5    6,007 ±  0,243  ops/us
VectorUtilBenchmark.binaryDotProductScalar     512  thrpt    5    3,463 ±  0,433  ops/us
VectorUtilBenchmark.binaryDotProductScalar     702  thrpt    5    2,602 ±  0,063  ops/us
VectorUtilBenchmark.binaryDotProductScalar    1024  thrpt    5    1,755 ±  0,073  ops/us
VectorUtilBenchmark.binaryDotProductVector       1  thrpt    5  154,801 ± 45,755  ops/us
VectorUtilBenchmark.binaryDotProductVector     128  thrpt    5   50,450 ± 10,559  ops/us
VectorUtilBenchmark.binaryDotProductVector     207  thrpt    5   30,656 ±  1,151  ops/us
VectorUtilBenchmark.binaryDotProductVector     256  thrpt    5   30,256 ±  1,618  ops/us
VectorUtilBenchmark.binaryDotProductVector     300  thrpt    5   23,890 ±  6,478  ops/us
VectorUtilBenchmark.binaryDotProductVector     512  thrpt    5   16,696 ±  0,571  ops/us
VectorUtilBenchmark.binaryDotProductVector     702  thrpt    5   11,718 ±  0,265  ops/us
VectorUtilBenchmark.binaryDotProductVector    1024  thrpt    5    8,760 ±  0,194  ops/us
VectorUtilBenchmark.binarySquareScalar           1  thrpt    5  251,177 ± 83,185  ops/us
VectorUtilBenchmark.binarySquareScalar         128  thrpt    5   11,902 ±  1,279  ops/us
VectorUtilBenchmark.binarySquareScalar         207  thrpt    5    7,244 ±  2,344  ops/us
VectorUtilBenchmark.binarySquareScalar         256  thrpt    5    5,975 ±  1,489  ops/us
VectorUtilBenchmark.binarySquareScalar         300  thrpt    5    5,089 ±  0,309  ops/us
VectorUtilBenchmark.binarySquareScalar         512  thrpt    5    3,139 ±  0,205  ops/us
VectorUtilBenchmark.binarySquareScalar         702  thrpt    5    2,325 ±  0,200  ops/us
VectorUtilBenchmark.binarySquareScalar        1024  thrpt    5    1,586 ±  0,032  ops/us
VectorUtilBenchmark.binarySquareVector           1  thrpt    5  179,243 ± 12,767  ops/us
VectorUtilBenchmark.binarySquareVector         128  thrpt    5   41,748 ±  1,302  ops/us
VectorUtilBenchmark.binarySquareVector         207  thrpt    5   25,865 ±  0,939  ops/us
VectorUtilBenchmark.binarySquareVector         256  thrpt    5   25,354 ±  1,070  ops/us
VectorUtilBenchmark.binarySquareVector         300  thrpt    5   20,371 ±  0,653  ops/us
VectorUtilBenchmark.binarySquareVector         512  thrpt    5   14,283 ±  0,631  ops/us
VectorUtilBenchmark.binarySquareVector         702  thrpt    5    9,980 ±  0,344  ops/us
VectorUtilBenchmark.binarySquareVector        1024  thrpt    5    6,684 ±  3,338  ops/us
VectorUtilBenchmark.floatCosineScalar            1  thrpt    5  190,660 ±  5,937  ops/us
VectorUtilBenchmark.floatCosineScalar          128  thrpt    5    7,029 ±  0,202  ops/us
VectorUtilBenchmark.floatCosineScalar          207  thrpt    5    4,424 ±  0,116  ops/us
VectorUtilBenchmark.floatCosineScalar          256  thrpt    5    3,473 ±  1,401  ops/us
VectorUtilBenchmark.floatCosineScalar          300  thrpt    5    3,144 ±  0,048  ops/us
VectorUtilBenchmark.floatCosineScalar          512  thrpt    5    1,653 ±  0,030  ops/us
VectorUtilBenchmark.floatCosineScalar          702  thrpt    5    1,210 ±  0,037  ops/us
VectorUtilBenchmark.floatCosineScalar         1024  thrpt    5    0,795 ±  0,049  ops/us
VectorUtilBenchmark.floatCosineVector            1  thrpt    5  132,462 ±  7,447  ops/us
VectorUtilBenchmark.floatCosineVector          128  thrpt    5   26,174 ±  0,621  ops/us
VectorUtilBenchmark.floatCosineVector          207  thrpt    5   15,948 ±  2,942  ops/us
VectorUtilBenchmark.floatCosineVector          256  thrpt    5   17,445 ±  2,915  ops/us
VectorUtilBenchmark.floatCosineVector          300  thrpt    5   14,293 ±  1,994  ops/us
VectorUtilBenchmark.floatCosineVector          512  thrpt    5   11,711 ±  0,523  ops/us
VectorUtilBenchmark.floatCosineVector          702  thrpt    5    8,415 ±  0,228  ops/us
VectorUtilBenchmark.floatCosineVector         1024  thrpt    5    6,859 ±  0,244  ops/us
VectorUtilBenchmark.floatDotProductScalar        1  thrpt    5  211,648 ±  9,064  ops/us
VectorUtilBenchmark.floatDotProductScalar      128  thrpt    5   16,845 ±  4,619  ops/us
VectorUtilBenchmark.floatDotProductScalar      207  thrpt    5   11,864 ±  0,254  ops/us
VectorUtilBenchmark.floatDotProductScalar      256  thrpt    5    9,471 ±  0,368  ops/us
VectorUtilBenchmark.floatDotProductScalar      300  thrpt    5    8,323 ±  0,262  ops/us
VectorUtilBenchmark.floatDotProductScalar      512  thrpt    5    4,850 ±  0,147  ops/us
VectorUtilBenchmark.floatDotProductScalar      702  thrpt    5    3,622 ±  0,182  ops/us
VectorUtilBenchmark.floatDotProductScalar     1024  thrpt    5    2,412 ±  0,188  ops/us
VectorUtilBenchmark.floatDotProductVector        1  thrpt    5  186,701 ±  9,767  ops/us
VectorUtilBenchmark.floatDotProductVector      128  thrpt    5   53,626 ± 12,555  ops/us
VectorUtilBenchmark.floatDotProductVector      207  thrpt    5   31,024 ±  0,692  ops/us
VectorUtilBenchmark.floatDotProductVector      256  thrpt    5   38,246 ±  0,693  ops/us
VectorUtilBenchmark.floatDotProductVector      300  thrpt    5   26,674 ±  0,843  ops/us
VectorUtilBenchmark.floatDotProductVector      512  thrpt    5   22,742 ±  0,608  ops/us
VectorUtilBenchmark.floatDotProductVector      702  thrpt    5   15,759 ±  0,346  ops/us
VectorUtilBenchmark.floatDotProductVector     1024  thrpt    5   13,512 ±  0,549  ops/us
VectorUtilBenchmark.floatSquareScalar            1  thrpt    5  306,048 ±  3,798  ops/us
VectorUtilBenchmark.floatSquareScalar          128  thrpt    5   13,377 ±  0,446  ops/us
VectorUtilBenchmark.floatSquareScalar          207  thrpt    5    7,939 ±  0,447  ops/us
VectorUtilBenchmark.floatSquareScalar          256  thrpt    5    6,760 ±  0,396  ops/us
VectorUtilBenchmark.floatSquareScalar          300  thrpt    5    5,415 ±  0,220  ops/us
VectorUtilBenchmark.floatSquareScalar          512  thrpt    5    3,261 ±  0,956  ops/us
VectorUtilBenchmark.floatSquareScalar          702  thrpt    5    2,394 ±  0,072  ops/us
VectorUtilBenchmark.floatSquareScalar         1024  thrpt    5    1,686 ±  0,043  ops/us
VectorUtilBenchmark.floatSquareVector            1  thrpt    5  182,811 ± 27,177  ops/us
VectorUtilBenchmark.floatSquareVector          128  thrpt    5   53,693 ±  2,607  ops/us
VectorUtilBenchmark.floatSquareVector          207  thrpt    5   26,490 ±  0,620  ops/us
VectorUtilBenchmark.floatSquareVector          256  thrpt    5   29,780 ±  2,791  ops/us
VectorUtilBenchmark.floatSquareVector          300  thrpt    5   22,809 ±  0,417  ops/us
VectorUtilBenchmark.floatSquareVector          512  thrpt    5   18,750 ±  1,414  ops/us
VectorUtilBenchmark.floatSquareVector          702  thrpt    5   13,374 ±  0,513  ops/us
VectorUtilBenchmark.floatSquareVector         1024  thrpt    5   11,284 ±  0,261  ops/us

@uschindler
Copy link
Contributor

Yeah, i think its better not to compile benchmarks with incubator/preview etc? I think benchmarks should just call lucene methods, and lucene code should do the right thing with handling incubator/preview?

If we backport this PR it would fail to compile and Jenkins would be unhappy, too.

@dweiss
Copy link
Contributor

dweiss commented Oct 13, 2023

The problem was: if you run the benchmark in module mode, the requirement added the module without respecting if the extra runtime parameter was given.

Yep, I understand. Did it also happen when it was started with -jar ("classpath" mode)? I don't think it should, right?

Yeah, i think its better not to compile benchmarks with incubator/preview etc?

Hey, I just took Rob's example, that's it - had to add that module to compile it! :)

@uschindler
Copy link
Contributor

Did it also happen when it was started with -jar ("classpath" mode)? I don't think it should, right?

Naaah, that was the most confusing part.

@uschindler
Copy link
Contributor

uschindler commented Oct 13, 2023

@rmuir I merged main branch here, so you can retest the better logging of #12676

@rmuir
Copy link
Member Author

rmuir commented Oct 13, 2023

The logging works correctly for my use-case (accidentally running java 17). Scalar method has no warnings, Vector method prints: WARNING: Java vector incubator module was enabled by command line flags, but your Java version is too old: 17

@rmuir rmuir merged commit a4ff129 into apache:main Oct 14, 2023
rmuir added a commit to rmuir/lucene that referenced this pull request Nov 13, 2023
* migrate all vectorbench methods to lucene

Co-authored-by: Uwe Schindler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants