Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@saucecontrol
Copy link
Member

There are 2 places in the Brotli encoder where GCC intrinsics for TZCNT and BSR are used. I have added the MSVC equivalents so we don't incur a performance penalty compared to GCC (and compatible) builds. I submitted a matching PR over in the Google repo, which looks like it will be picked up.

I noticed a slight performance regression between 2.1 and the 3.0 previews, which had been updated to Brotli v1.0.5. This change makes up that perf difference and more.

BenchmarkDotNet=v0.11.3, OS=Windows 10.0.17134.556 (1803/April2018Update/Redstone4)
Intel Xeon CPU E3-1505M v6 3.00GHz, 1 CPU, 8 logical and 4 physical cores
Frequency=2929685 Hz, Resolution=341.3336 ns, Timer=TSC
.NET Core SDK=3.0.100-preview-010184
  [Host]        : .NET Core 2.1.7 (CoreCLR 4.6.27129.04, CoreFX 4.6.27129.04), 64bit RyuJIT
  2.1.7         : .NET Core 2.1.7 (CoreCLR 4.6.27129.04, CoreFX 4.6.27129.04), 64bit RyuJIT
  3.0-preview-2 : .NET Core 3.0.0-preview-27324-5 (CoreCLR 4.6.27322.0, CoreFX 4.7.19.7311), 64bit RyuJIT
  PR            : .NET Core 24973de7-9f09-4c96-a4b3-c4463fc81b91 (CoreCLR 4.6.27403.0, CoreFX 4.7.19.7311), 64bit RyuJIT

Jit=RyuJit  Platform=X64  Runtime=Core  
Method Job Toolchain Compressed Size Mean Error StdDev Ratio RatioSD
Alice29_q0_w12 2.1.7 Default 78217 1.488 ms 0.0237 ms 0.0222 ms 1.00 0.00
Alice29_q0_w12 3.0-preview-2 Default 78217 1.499 ms 0.0266 ms 0.0249 ms 1.01 0.00
Alice29_q0_w12 PR CoreRun 78217 1.279 ms 0.0149 ms 0.0139 ms 0.86 0.00
Alice29_q5_w16 2.1.7 Default 53089 7.308 ms 0.0709 ms 0.0663 ms 1.00 0.00
Alice29_q5_w16 3.0-preview-2 Default 53089 7.808 ms 0.0862 ms 0.0719 ms 1.07 0.00
Alice29_q5_w16 PR CoreRun 53089 6.061 ms 0.0455 ms 0.0404 ms 0.83 0.01
Alice29_q11_w24 2.1.7 Default 46487 275.866 ms 2.6656 ms 2.4934 ms 1.00 0.00
Alice29_q11_w24 3.0-preview-2 Default 46487 286.640 ms 2.5965 ms 2.4288 ms 1.04 0.00
Alice29_q11_w24 PR CoreRun 46487 260.619 ms 2.2509 ms 1.9954 ms 0.94 0.00

@saucecontrol
Copy link
Member Author

cc @ahsonkhan

Full compress perf test results, v1.0.7 base vs this PR (edited for space)

 System.IO.Compression.Brotli.PerformanceTests.dll                      |  Average | STDEV.S |      Min |      Max
:----------------------------------------------------------------------:| --------:| -------:| --------:| --------:
-  Compress_Canterbury_WithoutState("alice29.txt", Fastest)             |    1.219 |   0.146 |    1.107 |    1.868
+  Compress_Canterbury_WithoutState("alice29.txt", Fastest)             |    0.905 |   0.092 |    0.842 |    1.462
-  Compress_Canterbury_WithoutState("alice29.txt", NoCompression)       |    0.968 |   0.139 |    0.862 |    1.549
+  Compress_Canterbury_WithoutState("alice29.txt", NoCompression)       |    0.600 |   0.030 |    0.584 |    0.773
-  Compress_Canterbury_WithoutState("alice29.txt", Optimal)             |  269.169 |   8.673 |  261.588 |  311.027
+  Compress_Canterbury_WithoutState("alice29.txt", Optimal)             |  243.610 |   6.038 |  235.890 |  275.083
-  Compress_Canterbury_WithoutState("asyoulik.txt", Fastest)            |    1.056 |   0.168 |    0.917 |    1.940
+  Compress_Canterbury_WithoutState("asyoulik.txt", Fastest)            |    0.844 |   0.164 |    0.731 |    1.678
-  Compress_Canterbury_WithoutState("asyoulik.txt", NoCompression)      |    0.875 |   0.214 |    0.743 |    2.532
+  Compress_Canterbury_WithoutState("asyoulik.txt", NoCompression)      |    0.548 |   0.044 |    0.513 |    0.840
-  Compress_Canterbury_WithoutState("asyoulik.txt", Optimal)            |  209.849 |   3.078 |  204.861 |  216.708
+  Compress_Canterbury_WithoutState("asyoulik.txt", Optimal)            |  192.353 |   2.988 |  187.355 |  202.614
-  Compress_Canterbury_WithoutState("cp.html", Fastest)                 |    0.170 |   0.018 |    0.158 |    0.286
+  Compress_Canterbury_WithoutState("cp.html", Fastest)                 |    0.120 |   0.014 |    0.112 |    0.189
-  Compress_Canterbury_WithoutState("cp.html", NoCompression)           |    0.142 |   0.037 |    0.125 |    0.347
+  Compress_Canterbury_WithoutState("cp.html", NoCompression)           |    0.097 |   0.003 |    0.094 |    0.113
-  Compress_Canterbury_WithoutState("cp.html", Optimal)                 |   36.413 |   1.355 |   34.136 |   40.683
+  Compress_Canterbury_WithoutState("cp.html", Optimal)                 |   34.616 |   4.462 |   31.215 |   56.121
-  Compress_Canterbury_WithoutState("fields.c", Fastest)                |    0.082 |   0.020 |    0.070 |    0.157
+  Compress_Canterbury_WithoutState("fields.c", Fastest)                |    0.049 |   0.007 |    0.044 |    0.088
-  Compress_Canterbury_WithoutState("fields.c", NoCompression)          |    0.059 |   0.006 |    0.054 |    0.090
+  Compress_Canterbury_WithoutState("fields.c", NoCompression)          |    0.039 |   0.002 |    0.038 |    0.051
-  Compress_Canterbury_WithoutState("fields.c", Optimal)                |   17.749 |   0.879 |   16.289 |   22.570
+  Compress_Canterbury_WithoutState("fields.c", Optimal)                |   16.045 |   0.905 |   14.840 |   20.842
-  Compress_Canterbury_WithoutState("grammar.lsp", Fastest)             |    0.026 |   0.006 |    0.024 |    0.072
+  Compress_Canterbury_WithoutState("grammar.lsp", Fastest)             |    0.024 |   0.009 |    0.020 |    0.083
-  Compress_Canterbury_WithoutState("grammar.lsp", NoCompression)       |    0.019 |   0.002 |    0.017 |    0.031
+  Compress_Canterbury_WithoutState("grammar.lsp", NoCompression)       |    0.018 |   0.002 |    0.016 |    0.025
-  Compress_Canterbury_WithoutState("grammar.lsp", Optimal)             |    6.770 |   0.614 |    6.038 |    9.409
+  Compress_Canterbury_WithoutState("grammar.lsp", Optimal)             |    6.109 |   0.454 |    5.405 |    7.606
-  Compress_Canterbury_WithoutState("kennedy.xls", Fastest)             |    4.408 |   0.570 |    3.887 |    6.367
+  Compress_Canterbury_WithoutState("kennedy.xls", Fastest)             |    3.622 |   0.387 |    3.228 |    4.853
-  Compress_Canterbury_WithoutState("kennedy.xls", NoCompression)       |    2.867 |   0.201 |    2.664 |    3.680
+  Compress_Canterbury_WithoutState("kennedy.xls", NoCompression)       |    2.371 |   0.190 |    2.157 |    3.565
-  Compress_Canterbury_WithoutState("kennedy.xls", Optimal)             | 2666.949 |  39.429 | 2638.025 | 2723.019
+  Compress_Canterbury_WithoutState("kennedy.xls", Optimal)             | 2551.466 |   3.404 | 2549.538 | 2556.553
-  Compress_Canterbury_WithoutState("lcet10.txt", Fastest)              |    3.683 |   0.391 |    3.325 |    5.012
+  Compress_Canterbury_WithoutState("lcet10.txt", Fastest)              |    2.829 |   0.270 |    2.508 |    3.618
-  Compress_Canterbury_WithoutState("lcet10.txt", NoCompression)        |    2.661 |   0.234 |    2.413 |    3.569
+  Compress_Canterbury_WithoutState("lcet10.txt", NoCompression)        |    2.018 |   2.072 |    1.510 |   19.780
-  Compress_Canterbury_WithoutState("lcet10.txt", Optimal)              |  837.574 |  16.980 |  824.885 |  880.225
+  Compress_Canterbury_WithoutState("lcet10.txt", Optimal)              |  752.714 |   8.551 |  742.157 |  768.464
-  Compress_Canterbury_WithoutState("plrabn12.txt", Fastest)            |    4.419 |   0.436 |    3.969 |    6.265
+  Compress_Canterbury_WithoutState("plrabn12.txt", Fastest)            |    3.575 |   0.459 |    3.092 |    5.088
-  Compress_Canterbury_WithoutState("plrabn12.txt", NoCompression)      |    3.517 |   0.357 |    3.106 |    4.685
+  Compress_Canterbury_WithoutState("plrabn12.txt", NoCompression)      |    2.225 |   0.147 |    2.086 |    2.984
-  Compress_Canterbury_WithoutState("plrabn12.txt", Optimal)            |  906.004 |   7.378 |  897.405 |  917.548
+  Compress_Canterbury_WithoutState("plrabn12.txt", Optimal)            |  828.987 |  14.865 |  814.300 |  866.763
-  Compress_Canterbury_WithoutState("ptt5", Fastest)                    |    1.199 |   0.132 |    1.075 |    1.703
+  Compress_Canterbury_WithoutState("ptt5", Fastest)                    |    0.944 |   0.109 |    0.836 |    1.341
-  Compress_Canterbury_WithoutState("ptt5", NoCompression)              |    1.072 |   0.124 |    0.977 |    1.764
+  Compress_Canterbury_WithoutState("ptt5", NoCompression)              |    0.780 |   0.109 |    0.706 |    1.596
-  Compress_Canterbury_WithoutState("ptt5", Optimal)                    | 1355.112 |  11.336 | 1339.058 | 1371.698
+  Compress_Canterbury_WithoutState("ptt5", Optimal)                    | 1052.047 |   5.641 | 1043.890 | 1058.504
-  Compress_Canterbury_WithoutState("sum", Fastest)                     |    0.269 |   0.075 |    0.201 |    0.510
+  Compress_Canterbury_WithoutState("sum", Fastest)                     |    0.222 |   0.068 |    0.162 |    0.427
-  Compress_Canterbury_WithoutState("sum", NoCompression)               |    0.174 |   0.031 |    0.152 |    0.335
+  Compress_Canterbury_WithoutState("sum", NoCompression)               |    0.123 |   0.012 |    0.116 |    0.182
-  Compress_Canterbury_WithoutState("sum", Optimal)                     |   62.296 |   2.100 |   58.130 |   74.047
+  Compress_Canterbury_WithoutState("sum", Optimal)                     |   58.627 |   5.462 |   53.762 |   84.067
-  Compress_Canterbury_WithoutState("TestDocument.doc", Fastest)        |    0.101 |   0.028 |    0.087 |    0.252
+  Compress_Canterbury_WithoutState("TestDocument.doc", Fastest)        |    0.080 |   0.020 |    0.071 |    0.214
-  Compress_Canterbury_WithoutState("TestDocument.doc", NoCompression)  |    0.068 |   0.015 |    0.058 |    0.166
+  Compress_Canterbury_WithoutState("TestDocument.doc", NoCompression)  |    0.053 |   0.007 |    0.048 |    0.093
-  Compress_Canterbury_WithoutState("TestDocument.doc", Optimal)        |   37.074 |   3.662 |   33.985 |   68.344
+  Compress_Canterbury_WithoutState("TestDocument.doc", Optimal)        |   32.475 |   1.099 |   30.288 |   36.389
-  Compress_Canterbury_WithoutState("TestDocument.docx", Fastest)       |    0.072 |   0.012 |    0.066 |    0.135
+  Compress_Canterbury_WithoutState("TestDocument.docx", Fastest)       |    0.065 |   0.009 |    0.061 |    0.126
-  Compress_Canterbury_WithoutState("TestDocument.docx", NoCompression) |    0.068 |   0.007 |    0.063 |    0.109
+  Compress_Canterbury_WithoutState("TestDocument.docx", NoCompression) |    0.073 |   0.016 |    0.062 |    0.160
-  Compress_Canterbury_WithoutState("TestDocument.docx", Optimal)       |   37.430 |   1.468 |   34.866 |   44.357
+  Compress_Canterbury_WithoutState("TestDocument.docx", Optimal)       |   36.162 |   1.009 |   34.349 |   41.145
-  Compress_Canterbury_WithoutState("TestDocument.pdf", Fastest)        |    0.425 |   0.109 |    0.370 |    1.241
+  Compress_Canterbury_WithoutState("TestDocument.pdf", Fastest)        |    0.392 |   0.043 |    0.371 |    0.646
-  Compress_Canterbury_WithoutState("TestDocument.pdf", NoCompression)  |    0.322 |   0.034 |    0.301 |    0.492
+  Compress_Canterbury_WithoutState("TestDocument.pdf", NoCompression)  |    0.313 |   0.030 |    0.297 |    0.484
-  Compress_Canterbury_WithoutState("TestDocument.pdf", Optimal)        |  566.702 |   3.411 |  561.637 |  573.785
+  Compress_Canterbury_WithoutState("TestDocument.pdf", Optimal)        |  578.972 |  17.271 |  564.081 |  644.136
-  Compress_Canterbury_WithoutState("TestDocument.txt", Fastest)        |    0.095 |   0.020 |    0.082 |    0.206
+  Compress_Canterbury_WithoutState("TestDocument.txt", Fastest)        |    0.089 |   0.023 |    0.075 |    0.203
-  Compress_Canterbury_WithoutState("TestDocument.txt", NoCompression)  |    0.026 |   0.008 |    0.023 |    0.095
+  Compress_Canterbury_WithoutState("TestDocument.txt", NoCompression)  |    0.026 |   0.007 |    0.025 |    0.096
-  Compress_Canterbury_WithoutState("TestDocument.txt", Optimal)        |    4.081 |   0.335 |    3.657 |    5.216
+  Compress_Canterbury_WithoutState("TestDocument.txt", Optimal)        |    3.888 |   0.402 |    3.358 |    5.408
-  Compress_Canterbury_WithoutState("xargs.1", Fastest)                 |    0.032 |   0.004 |    0.030 |    0.054
+  Compress_Canterbury_WithoutState("xargs.1", Fastest)                 |    0.026 |   0.003 |    0.024 |    0.049
-  Compress_Canterbury_WithoutState("xargs.1", NoCompression)           |    0.022 |   0.003 |    0.020 |    0.042
+  Compress_Canterbury_WithoutState("xargs.1", NoCompression)           |    0.022 |   0.009 |    0.019 |    0.072
-  Compress_Canterbury_WithoutState("xargs.1", Optimal)                 |    7.794 |   0.720 |    6.799 |   11.297
+  Compress_Canterbury_WithoutState("xargs.1", Optimal)                 |    7.094 |   0.579 |    6.454 |   10.696
-  Compress_Canterbury_WithState("alice29.txt", Fastest)                |    1.272 |   0.196 |    1.085 |    2.001
+  Compress_Canterbury_WithState("alice29.txt", Fastest)                |    0.976 |   0.170 |    0.822 |    1.588
-  Compress_Canterbury_WithState("alice29.txt", NoCompression)          |    1.012 |   0.207 |    0.866 |    2.055
+  Compress_Canterbury_WithState("alice29.txt", NoCompression)          |    0.607 |   0.032 |    0.594 |    0.869
-  Compress_Canterbury_WithState("alice29.txt", Optimal)                |  266.630 |   3.853 |  258.220 |  275.258
+  Compress_Canterbury_WithState("alice29.txt", Optimal)                |  241.458 |   4.078 |  234.457 |  252.477
-  Compress_Canterbury_WithState("asyoulik.txt", Fastest)               |    1.065 |   0.202 |    0.924 |    2.048
+  Compress_Canterbury_WithState("asyoulik.txt", Fastest)               |    0.789 |   0.083 |    0.745 |    1.228
-  Compress_Canterbury_WithState("asyoulik.txt", NoCompression)         |    0.884 |   0.134 |    0.757 |    1.317
+  Compress_Canterbury_WithState("asyoulik.txt", NoCompression)         |    0.562 |   0.065 |    0.508 |    0.910
-  Compress_Canterbury_WithState("asyoulik.txt", Optimal)               |  210.875 |   2.871 |  205.030 |  218.782
+  Compress_Canterbury_WithState("asyoulik.txt", Optimal)               |  191.385 |   2.517 |  186.355 |  197.966
-  Compress_Canterbury_WithState("cp.html", Fastest)                    |    0.195 |   0.038 |    0.161 |    0.327
+  Compress_Canterbury_WithState("cp.html", Fastest)                    |    0.125 |   0.028 |    0.112 |    0.275
-  Compress_Canterbury_WithState("cp.html", NoCompression)              |    0.147 |   0.020 |    0.140 |    0.267
+  Compress_Canterbury_WithState("cp.html", NoCompression)              |    0.111 |   0.005 |    0.106 |    0.133
-  Compress_Canterbury_WithState("cp.html", Optimal)                    |   36.039 |   1.416 |   33.062 |   41.167
+  Compress_Canterbury_WithState("cp.html", Optimal)                    |   32.322 |   1.029 |   30.369 |   35.513
-  Compress_Canterbury_WithState("fields.c", Fastest)                   |    0.076 |   0.010 |    0.073 |    0.157
+  Compress_Canterbury_WithState("fields.c", Fastest)                   |    0.057 |   0.010 |    0.048 |    0.100
-  Compress_Canterbury_WithState("fields.c", NoCompression)             |    0.071 |   0.008 |    0.065 |    0.116
+  Compress_Canterbury_WithState("fields.c", NoCompression)             |    0.061 |   0.019 |    0.046 |    0.116
-  Compress_Canterbury_WithState("fields.c", Optimal)                   |   17.252 |   0.772 |   15.661 |   20.173
+  Compress_Canterbury_WithState("fields.c", Optimal)                   |   15.555 |   0.780 |   14.352 |   19.407
-  Compress_Canterbury_WithState("grammar.lsp", Fastest)                |    0.029 |   0.007 |    0.024 |    0.073
+  Compress_Canterbury_WithState("grammar.lsp", Fastest)                |    0.022 |   0.005 |    0.020 |    0.059
-  Compress_Canterbury_WithState("grammar.lsp", NoCompression)          |    0.029 |   0.020 |    0.023 |    0.212
+  Compress_Canterbury_WithState("grammar.lsp", NoCompression)          |    0.025 |   0.007 |    0.022 |    0.066
-  Compress_Canterbury_WithState("grammar.lsp", Optimal)                |    6.827 |   0.601 |    5.940 |    8.774
+  Compress_Canterbury_WithState("grammar.lsp", Optimal)                |    6.169 |   0.557 |    5.438 |    7.779
-  Compress_Canterbury_WithState("kennedy.xls", Fastest)                |    4.086 |   0.446 |    3.662 |    5.758
+  Compress_Canterbury_WithState("kennedy.xls", Fastest)                |    3.673 |   0.411 |    3.201 |    5.142
-  Compress_Canterbury_WithState("kennedy.xls", NoCompression)          |    3.051 |   0.443 |    2.627 |    4.761
+  Compress_Canterbury_WithState("kennedy.xls", NoCompression)          |    2.574 |   0.543 |    2.250 |    6.299
-  Compress_Canterbury_WithState("kennedy.xls", Optimal)                | 2663.118 |   5.475 | 2655.548 | 2667.177
+  Compress_Canterbury_WithState("kennedy.xls", Optimal)                | 2570.798 |  16.426 | 2551.147 | 2584.402
-  Compress_Canterbury_WithState("lcet10.txt", Fastest)                 |    3.403 |   0.375 |    2.980 |    4.818
+  Compress_Canterbury_WithState("lcet10.txt", Fastest)                 |    2.541 |   0.290 |    2.256 |    3.949
-  Compress_Canterbury_WithState("lcet10.txt", NoCompression)           |    2.695 |   0.308 |    2.389 |    3.787
+  Compress_Canterbury_WithState("lcet10.txt", NoCompression)           |    1.699 |   0.196 |    1.524 |    2.625
-  Compress_Canterbury_WithState("lcet10.txt", Optimal)                 |  841.948 |  21.321 |  827.811 |  908.402
+  Compress_Canterbury_WithState("lcet10.txt", Optimal)                 |  745.196 |   5.925 |  738.894 |  758.802
-  Compress_Canterbury_WithState("plrabn12.txt", Fastest)               |    4.439 |   0.403 |    3.923 |    5.569
+  Compress_Canterbury_WithState("plrabn12.txt", Fastest)               |    3.556 |   0.403 |    3.091 |    5.205
-  Compress_Canterbury_WithState("plrabn12.txt", NoCompression)         |    3.405 |   0.312 |    3.096 |    4.843
+  Compress_Canterbury_WithState("plrabn12.txt", NoCompression)         |    2.531 |   1.016 |    2.052 |    8.305
-  Compress_Canterbury_WithState("plrabn12.txt", Optimal)               |  902.867 |   9.481 |  892.144 |  926.019
+  Compress_Canterbury_WithState("plrabn12.txt", Optimal)               |  822.698 |   8.306 |  809.662 |  845.065
-  Compress_Canterbury_WithState("ptt5", Fastest)                       |    1.202 |   0.145 |    1.087 |    1.767
+  Compress_Canterbury_WithState("ptt5", Fastest)                       |    1.259 |   0.691 |    0.883 |    3.881
-  Compress_Canterbury_WithState("ptt5", NoCompression)                 |    1.108 |   0.162 |    0.959 |    1.750
+  Compress_Canterbury_WithState("ptt5", NoCompression)                 |    0.767 |   0.054 |    0.730 |    1.126
-  Compress_Canterbury_WithState("ptt5", Optimal)                       | 1354.923 |   8.380 | 1343.445 | 1368.249
+  Compress_Canterbury_WithState("ptt5", Optimal)                       | 1058.934 |  27.511 | 1041.584 | 1133.862
-  Compress_Canterbury_WithState("sum", Fastest)                        |    0.227 |   0.048 |    0.202 |    0.526
+  Compress_Canterbury_WithState("sum", Fastest)                        |    0.179 |   0.028 |    0.166 |    0.312
-  Compress_Canterbury_WithState("sum", NoCompression)                  |    0.182 |   0.025 |    0.170 |    0.371
+  Compress_Canterbury_WithState("sum", NoCompression)                  |    0.135 |   0.010 |    0.126 |    0.176
-  Compress_Canterbury_WithState("sum", Optimal)                        |   62.217 |   3.787 |   58.357 |   86.804
+  Compress_Canterbury_WithState("sum", Optimal)                        |   55.420 |   1.424 |   52.720 |   61.378
-  Compress_Canterbury_WithState("TestDocument.doc", Fastest)           |    0.101 |   0.033 |    0.086 |    0.309
+  Compress_Canterbury_WithState("TestDocument.doc", Fastest)           |    0.080 |   0.027 |    0.072 |    0.311
-  Compress_Canterbury_WithState("TestDocument.doc", NoCompression)     |    0.081 |   0.014 |    0.071 |    0.132
+  Compress_Canterbury_WithState("TestDocument.doc", NoCompression)     |    0.059 |   0.003 |    0.057 |    0.074
-  Compress_Canterbury_WithState("TestDocument.doc", Optimal)           |   37.342 |   3.309 |   34.386 |   55.877
+  Compress_Canterbury_WithState("TestDocument.doc", Optimal)           |   32.642 |   1.220 |   30.462 |   37.163
-  Compress_Canterbury_WithState("TestDocument.docx", Fastest)          |    0.074 |   0.015 |    0.068 |    0.181
+  Compress_Canterbury_WithState("TestDocument.docx", Fastest)          |    0.062 |   0.003 |    0.060 |    0.076
-  Compress_Canterbury_WithState("TestDocument.docx", NoCompression)    |    0.075 |   0.005 |    0.072 |    0.110
+  Compress_Canterbury_WithState("TestDocument.docx", NoCompression)    |    0.086 |   0.026 |    0.070 |    0.232
-  Compress_Canterbury_WithState("TestDocument.docx", Optimal)          |   36.927 |   1.076 |   34.656 |   39.801
+  Compress_Canterbury_WithState("TestDocument.docx", Optimal)          |   35.890 |   1.166 |   33.864 |   39.971
-  Compress_Canterbury_WithState("TestDocument.pdf", Fastest)           |    0.416 |   0.100 |    0.381 |    1.154
+  Compress_Canterbury_WithState("TestDocument.pdf", Fastest)           |    0.474 |   0.255 |    0.357 |    2.208
-  Compress_Canterbury_WithState("TestDocument.pdf", NoCompression)     |    0.337 |   0.052 |    0.299 |    0.601
+  Compress_Canterbury_WithState("TestDocument.pdf", NoCompression)     |    0.317 |   0.026 |    0.299 |    0.471
-  Compress_Canterbury_WithState("TestDocument.pdf", Optimal)           |  567.716 |   7.671 |  556.144 |  592.784
+  Compress_Canterbury_WithState("TestDocument.pdf", Optimal)           |  580.094 |  19.408 |  568.391 |  651.445
-  Compress_Canterbury_WithState("TestDocument.txt", Fastest)           |    0.029 |   0.037 |    0.018 |    0.347
+  Compress_Canterbury_WithState("TestDocument.txt", Fastest)           |    0.021 |   0.010 |    0.016 |    0.073
-  Compress_Canterbury_WithState("TestDocument.txt", NoCompression)     |    0.034 |   0.009 |    0.028 |    0.079
+  Compress_Canterbury_WithState("TestDocument.txt", NoCompression)     |    0.037 |   0.018 |    0.030 |    0.151
-  Compress_Canterbury_WithState("TestDocument.txt", Optimal)           |    3.936 |   0.562 |    3.296 |    5.983
+  Compress_Canterbury_WithState("TestDocument.txt", Optimal)           |    4.032 |   0.834 |    3.150 |    6.454
-  Compress_Canterbury_WithState("xargs.1", Fastest)                    |    0.034 |   0.004 |    0.030 |    0.049
+  Compress_Canterbury_WithState("xargs.1", Fastest)                    |    0.027 |   0.003 |    0.025 |    0.041
-  Compress_Canterbury_WithState("xargs.1", NoCompression)              |    0.034 |   0.008 |    0.027 |    0.074
+  Compress_Canterbury_WithState("xargs.1", NoCompression)              |    0.027 |   0.003 |    0.025 |    0.045
-  Compress_Canterbury_WithState("xargs.1", Optimal)                    |    7.487 |   0.589 |    6.542 |    9.754
+  Compress_Canterbury_WithState("xargs.1", Optimal)                    |    6.745 |   0.490 |    5.998 |    7.881
-  Compress_Canterbury("alice29.txt", Fastest)                          |    1.235 |   0.137 |    1.130 |    1.944
+  Compress_Canterbury("alice29.txt", Fastest)                          |    1.039 |   0.148 |    0.927 |    1.620
-  Compress_Canterbury("alice29.txt", NoCompression)                    |    1.018 |   0.129 |    0.901 |    1.752
+  Compress_Canterbury("alice29.txt", NoCompression)                    |    0.671 |   0.048 |    0.635 |    0.939
-  Compress_Canterbury("alice29.txt", Optimal)                          |  270.635 |  11.905 |  258.136 |  333.321
+  Compress_Canterbury("alice29.txt", Optimal)                          |  245.969 |   7.757 |  231.059 |  265.503
-  Compress_Canterbury("asyoulik.txt", Fastest)                         |    1.096 |   0.195 |    0.956 |    2.518
+  Compress_Canterbury("asyoulik.txt", Fastest)                         |    0.880 |   0.145 |    0.776 |    1.422
-  Compress_Canterbury("asyoulik.txt", NoCompression)                   |    0.890 |   0.120 |    0.785 |    1.448
+  Compress_Canterbury("asyoulik.txt", NoCompression)                   |    0.597 |   0.034 |    0.579 |    0.795
-  Compress_Canterbury("asyoulik.txt", Optimal)                         |  211.859 |   4.600 |  203.336 |  227.932
+  Compress_Canterbury("asyoulik.txt", Optimal)                         |  192.562 |   7.051 |  184.170 |  217.607
-  Compress_Canterbury("cp.html", Fastest)                              |    0.183 |   0.058 |    0.165 |    0.628
+  Compress_Canterbury("cp.html", Fastest)                              |    0.168 |   0.073 |    0.117 |    0.827
-  Compress_Canterbury("cp.html", NoCompression)                        |    0.168 |   0.054 |    0.144 |    0.618
+  Compress_Canterbury("cp.html", NoCompression)                        |    0.135 |   0.052 |    0.117 |    0.612
-  Compress_Canterbury("cp.html", Optimal)                              |   36.090 |   1.396 |   33.851 |   42.423
+  Compress_Canterbury("cp.html", Optimal)                              |   32.115 |   2.783 |   30.041 |   51.049
-  Compress_Canterbury("fields.c", Fastest)                             |    0.086 |   0.014 |    0.076 |    0.145
+  Compress_Canterbury("fields.c", Fastest)                             |    0.085 |   0.021 |    0.052 |    0.191
-  Compress_Canterbury("fields.c", NoCompression)                       |    0.083 |   0.014 |    0.072 |    0.143
+  Compress_Canterbury("fields.c", NoCompression)                       |    0.059 |   0.008 |    0.051 |    0.086
-  Compress_Canterbury("fields.c", Optimal)                             |   17.243 |   0.900 |   15.720 |   20.332
+  Compress_Canterbury("fields.c", Optimal)                             |   15.710 |   0.816 |   14.618 |   19.892
-  Compress_Canterbury("grammar.lsp", Fastest)                          |    0.041 |   0.055 |    0.027 |    0.579
+  Compress_Canterbury("grammar.lsp", Fastest)                          |    0.040 |   0.068 |    0.023 |    0.683
-  Compress_Canterbury("grammar.lsp", NoCompression)                    |    0.029 |   0.005 |    0.025 |    0.054
+  Compress_Canterbury("grammar.lsp", NoCompression)                    |    0.027 |   0.004 |    0.024 |    0.051
-  Compress_Canterbury("grammar.lsp", Optimal)                          |    6.664 |   0.460 |    5.954 |    8.447
+  Compress_Canterbury("grammar.lsp", Optimal)                          |    6.377 |   0.652 |    5.811 |   10.402
-  Compress_Canterbury("kennedy.xls", Fastest)                          |    4.218 |   0.923 |    3.652 |   12.475
+  Compress_Canterbury("kennedy.xls", Fastest)                          |    3.829 |   0.470 |    3.321 |    5.960
-  Compress_Canterbury("kennedy.xls", NoCompression)                    |    3.036 |   0.362 |    2.639 |    4.199
+  Compress_Canterbury("kennedy.xls", NoCompression)                    |    2.530 |   0.303 |    2.325 |    4.520
-  Compress_Canterbury("kennedy.xls", Optimal)                          | 2654.240 |  21.585 | 2636.523 | 2682.780
+  Compress_Canterbury("kennedy.xls", Optimal)                          | 2585.241 |  12.884 | 2574.880 | 2601.915
-  Compress_Canterbury("lcet10.txt", Fastest)                           |    3.430 |   0.338 |    3.055 |    4.876
+  Compress_Canterbury("lcet10.txt", Fastest)                           |    2.571 |   0.148 |    2.435 |    3.371
-  Compress_Canterbury("lcet10.txt", NoCompression)                     |    2.738 |   0.277 |    2.400 |    3.730
+  Compress_Canterbury("lcet10.txt", NoCompression)                     |    1.716 |   0.143 |    1.633 |    2.600
-  Compress_Canterbury("lcet10.txt", Optimal)                           |  833.566 |   7.028 |  822.675 |  845.543
+  Compress_Canterbury("lcet10.txt", Optimal)                           |  758.690 |  14.802 |  730.686 |  789.723
-  Compress_Canterbury("plrabn12.txt", Fastest)                         |    4.477 |   0.428 |    3.928 |    6.103
+  Compress_Canterbury("plrabn12.txt", Fastest)                         |    3.447 |   0.202 |    3.196 |    4.251
-  Compress_Canterbury("plrabn12.txt", NoCompression)                   |    3.572 |   0.363 |    3.107 |    4.795
+  Compress_Canterbury("plrabn12.txt", NoCompression)                   |    2.322 |   0.163 |    2.194 |    3.235
-  Compress_Canterbury("plrabn12.txt", Optimal)                         |  907.135 |   7.493 |  893.440 |  921.431
+  Compress_Canterbury("plrabn12.txt", Optimal)                         |  834.578 |  14.729 |  816.497 |  863.284
-  Compress_Canterbury("ptt5", Fastest)                                 |    1.266 |   0.202 |    1.093 |    2.178
+  Compress_Canterbury("ptt5", Fastest)                                 |    1.041 |   0.137 |    0.943 |    1.690
-  Compress_Canterbury("ptt5", NoCompression)                           |    1.096 |   0.101 |    1.021 |    1.446
+  Compress_Canterbury("ptt5", NoCompression)                           |    0.811 |   0.074 |    0.756 |    1.118
-  Compress_Canterbury("ptt5", Optimal)                                 | 1345.753 |   8.623 | 1334.310 | 1355.328
+  Compress_Canterbury("ptt5", Optimal)                                 | 1048.621 |  17.723 | 1016.943 | 1075.439
-  Compress_Canterbury("sum", Fastest)                                  |    0.227 |   0.059 |    0.204 |    0.721
+  Compress_Canterbury("sum", Fastest)                                  |    0.186 |   0.051 |    0.171 |    0.641
-  Compress_Canterbury("sum", NoCompression)                            |    0.193 |   0.054 |    0.174 |    0.667
+  Compress_Canterbury("sum", NoCompression)                            |    0.152 |   0.069 |    0.139 |    0.812
-  Compress_Canterbury("sum", Optimal)                                  |   61.731 |   2.182 |   58.163 |   70.890
+  Compress_Canterbury("sum", Optimal)                                  |   57.020 |   3.504 |   52.442 |   71.705
-  Compress_Canterbury("TestDocument.doc", Fastest)                     |    0.098 |   0.013 |    0.091 |    0.195
+  Compress_Canterbury("TestDocument.doc", Fastest)                     |    0.113 |   0.025 |    0.078 |    0.164
-  Compress_Canterbury("TestDocument.doc", NoCompression)               |    0.079 |   0.006 |    0.074 |    0.104
+  Compress_Canterbury("TestDocument.doc", NoCompression)               |    0.069 |   0.018 |    0.061 |    0.238
-  Compress_Canterbury("TestDocument.doc", Optimal)                     |   36.733 |   1.231 |   34.786 |   40.521
+  Compress_Canterbury("TestDocument.doc", Optimal)                     |   32.599 |   2.655 |   30.229 |   44.355
-  Compress_Canterbury("TestDocument.docx", Fastest)                    |    0.076 |   0.007 |    0.070 |    0.117
+  Compress_Canterbury("TestDocument.docx", Fastest)                    |    0.115 |   0.009 |    0.103 |    0.160
-  Compress_Canterbury("TestDocument.docx", NoCompression)              |    0.089 |   0.018 |    0.076 |    0.152
+  Compress_Canterbury("TestDocument.docx", NoCompression)              |    0.082 |   0.009 |    0.076 |    0.127
-  Compress_Canterbury("TestDocument.docx", Optimal)                    |   37.472 |   1.558 |   34.181 |   43.270
+  Compress_Canterbury("TestDocument.docx", Optimal)                    |   36.659 |   2.236 |   33.908 |   46.430
-  Compress_Canterbury("TestDocument.pdf", Fastest)                     |    0.464 |   0.063 |    0.409 |    0.716
+  Compress_Canterbury("TestDocument.pdf", Fastest)                     |    0.495 |   0.082 |    0.415 |    0.843
-  Compress_Canterbury("TestDocument.pdf", NoCompression)               |    0.386 |   0.042 |    0.337 |    0.533
+  Compress_Canterbury("TestDocument.pdf", NoCompression)               |    0.389 |   0.042 |    0.363 |    0.582
-  Compress_Canterbury("TestDocument.pdf", Optimal)                     |  567.276 |   4.635 |  560.703 |  578.078
+  Compress_Canterbury("TestDocument.pdf", Optimal)                     |  575.646 |  13.921 |  557.289 |  608.856
-  Compress_Canterbury("TestDocument.txt", Fastest)                     |    0.038 |   0.030 |    0.022 |    0.285
+  Compress_Canterbury("TestDocument.txt", Fastest)                     |    0.026 |   0.009 |    0.020 |    0.078
-  Compress_Canterbury("TestDocument.txt", NoCompression)               |    0.044 |   0.011 |    0.032 |    0.105
+  Compress_Canterbury("TestDocument.txt", NoCompression)               |    0.041 |   0.009 |    0.034 |    0.073
-  Compress_Canterbury("TestDocument.txt", Optimal)                     |    3.852 |   0.580 |    3.301 |    6.302
+  Compress_Canterbury("TestDocument.txt", Optimal)                     |    3.494 |   0.194 |    3.239 |    4.282
-  Compress_Canterbury("xargs.1", Fastest)                              |    0.040 |   0.010 |    0.033 |    0.082
+  Compress_Canterbury("xargs.1", Fastest)                              |    0.032 |   0.005 |    0.028 |    0.052
-  Compress_Canterbury("xargs.1", NoCompression)                        |    0.039 |   0.045 |    0.029 |    0.475
+  Compress_Canterbury("xargs.1", NoCompression)                        |    0.051 |   0.061 |    0.029 |    0.633
-  Compress_Canterbury("xargs.1", Optimal)                              |    7.464 |   0.516 |    6.646 |    8.969
+  Compress_Canterbury("xargs.1", Optimal)                              |    6.800 |   0.383 |    6.155 |    8.174

@karelz
Copy link
Member

karelz commented Mar 4, 2019

@buyaa-n @ahsonkhan @ViktorHofer can you please take a look? It's 2+ weeks old PR without code review yet ...

@saucecontrol
Copy link
Member Author

Rebased on master post v1.0.7 merge

@ahsonkhan
Copy link

From #35172 (comment)

My only concern with waiting for the MSVC intrinsics change to be merged in the Google repo is that it looks like there is some work in progress for their next version (shared Brotli?) that may take a while. Given the perf gain, I hope you'll consider taking that change before the 3.0 release with the understanding it will be "official" by the time we sync up with the Google repo again post-3.0.

Ideally, the change from this PR wouldn't be required explicitly and we can just updated the brotli version, once available, and keep the sources in sync that way. Otherwise, we have to keep track of these source differences explicitly which increases the maintainability burden since we effectively end up having a fork which has to be consolidated when the next update becomes available. Maybe its the right trade off here given the perf gain, but I am not sure. Do we have some scenario where this perf improvement shows up significantly to help motivate this change?

@stephentoub, what are your thoughts on this?

@stephentoub
Copy link
Member

@stephentoub, what are your thoughts on this?

I would like for us to avoid having forked code long-term; ideally the only reason we have the code in the repo at all is because we need to build it to ensure it's deployed with .NET Core. My preference thus would be to not diverge at all, and just wait for an updated checkpoint from upstream that includes the intrinsics change. However, if there is a meaningful improvement here (e.g. an answer to @ahsonkhan's question "Do we have some scenario where this perf improvement shows up significantly to help motivate this change?"), I'd be ok accepting this PR for the very short term, but with the understanding that it's temporary and that we will simply overwrite the changes the next time we take a source code copy of Brotli, rather than trying to merge. If at that point the PR to upstream went through, great, we'll get the improvements, and if at that point the PR wasn't, well, there must be a reason for it and we'll simply sync up with what's upstream and lose this intrinsics change.

@saucecontrol
Copy link
Member Author

saucecontrol commented Mar 7, 2019

Do we have some scenario where this perf improvement shows up significantly to help motivate this change?

The scenario would be any place anyone is using System.IO.Compression.Brotli on Windows, including the Response Compression Middleware. As the perf tests show, this PR accounts for a 10-30% improvement, with the bias being toward highly-compressible content (e.g. HTML, CSS, JS/JSON). The perf project here uses the Canterbury Corpus, which was specifically constructed to represent real-world workloads. In fact the only places this PR doesn't show significant improvements in perf are on the [non-Canterbury] .docx and .pdf files, which are not compressible and therefore not likely to be served through a compression filter.

I agree that maintaining this patch long-term should be a non-goal. If it gets clobbered by the next upstream sync, we'd be no worse off than we are now. But hopefully by that time, the matching PR will be merged, meaning we just get the benefits for longer if we take it now.

@stephentoub
Copy link
Member

cc: @joshfree

@ahsonkhan, can you make a call on this and either review/merge it or close it? Thanks.

@saucecontrol
Copy link
Member Author

Just noticed that @mjsabby mentioned Bing using the netcore Brotli support here https://devblogs.microsoft.com/dotnet/bing-com-runs-on-net-core-2-1/

If that's still true, that might be your compelling use case.

@danmoseley
Copy link
Member

@saucecontrol any idea why your upstream PR hasn't been picked up yet? Since everyone agrees that would be ideal..

@saucecontrol
Copy link
Member Author

I could only speculate, but some background might help.

When I initially submitted the upstream PR over a year ago, the platform/compiler abstraction in the project (platform.h) was going through a redesign, and the structure of my initial changes didn't fit with what the project maintainers were planning. They have since solidified that structure, and I've updated my PR to match, which seems to meet with their approval.

In the meantime, work on the project has stalled a bit, so it's probably just a matter of their getting back into it. That said, it looks to me like the next release on their side might not come any time soon, so it made sense to open a patch PR here instead of waiting.

@joshfree
Copy link
Member

I'm in agreement here that we should not fork brotli from our upstream without a super compelling case. While the performance win is super awesome, I'm in favor of pushing on the original upstream PR and then picking up the latest version of brotli from there.

@stephentoub
Copy link
Member

I'm in agreement here that we should not fork brotli from our upstream without a super compelling case. While the performance win is super awesome, I'm in favor of pushing on the original upstream PR and then picking up the latest version of brotli from there.

Thanks, @joshfree. Given this, I'm going to close this PR.

@saucecontrol, thank you very much for your efforts here. To whatever extent we can influence it, let's try to get your changes integrated upstream, and then we'll happily consume an updated version of the official source.

@karelz karelz added this to the 3.0 milestone Apr 1, 2019
@saucecontrol
Copy link
Member Author

This patch was finally merged in the upstream repo today. I'd still like to get clrcompression.dll updated with these perf improvements but I am unsure how to proceed.

If you look at the post-v1.0.7 commits in the Brotli repo, you can see references to "Shared-Brotli". This is effectively Brotli 2.0, with a new RFC and a new IANA designation (sbr vs br). My guess is that the next official release might still be quite far off.

Now that these specific changes have been merged upstream, would a revival of this PR in the runtime repo be accepted, or would a sync to current master in the upstream repo be acceptable, or do we need to wait for the next official release tag?

@stephentoub
Copy link
Member

Now that these specific changes have been merged upstream, would a revival of this PR in the runtime repo be accepted, or would a sync to current master in the upstream repo be acceptable, or do we need to wait for the next official release tag?

@ericstj, opinion?

@ericstj
Copy link
Member

ericstj commented Mar 27, 2020

If they will by 5.0 then I can see us taking a change now followed by another when they release before 5.0. If there isn't a planned release I'd feel reluctant about us taking a random snap of their codebase since we don't have a read on the quality and it would mean we'd be applying a higher stability promise than the code owners. Is there a way we can get a 1.0.8 release?

@danmoseley
Copy link
Member

Put another way, we could find ourselves in a position where we have to roll back the preview we ingested because they didn't release a stable version by (say) Aug.

@saucecontrol
Copy link
Member Author

I think it's unlikely we'll see a new Brotli release before the cut for 5.0. If there is a release, it's even more unlikely it would be a minor 1.0.8 version, based on the fact that some of the Shared Brotli code has been merged into master already.

If there isn't a planned release I'd feel reluctant about us taking a random snap of their codebase

Because current master includes changes for the next major version, I would agree.

The targeted change in this PR was built on v1.0.7, so it would be safe to apply in isolation, and the risk that it would be clobbered by a future upstream sync has been eliminated now that it's merged there.

If the ruling is that we'd only sync on a release milestone upstream, that does answer my question, though. Just wanted to clarify that since there was discussion about diverging from the Google codebase at one point, which has been partially addressed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants