[RFC] Introduce SobolEngine #10505

vishwakftw · 2018-08-14T14:32:23Z

SobolEngine is a quasi-random sampler used to sample points evenly between [0,1]. Here we use direction numbers to generate these samples. The maximum supported dimension for the sampler is 1111.

Documentation has been added, tests have been added based on @Balandat 's references. The implementation is an optimized / tensor-ized implementation of @Balandat 's implementation in Cython as provided in #9332.

This closes #9332 .

cc: @soumith @Balandat

1. Added the native implementations to ATen 2. Created a basic interface for SobolEngine for Python which needs documentation Todo: - Compile (there is a linking issue - Tests

New developments: - Sequences are not the same - Segmentation fault in the scramble function - Chunking of ltm in scramble throws error when using at::native::where

…nstead of its reverse in cdot_pow2

aten/src/ATen/native/SobolEngineOpsUtils.h

+  return at::mul(inter, bmat).sum(-1);
+}
+
+/// All definitions below this point are data. These are constant, and should not be modified


torch/quasirandom.py

+            if self.seed is not None:
+                g.manual_seed(self.seed)
+
+            self.shift = torch.mv(torch.randint(2, (self.dimension, self.MAXBIT), dtype=torch.float, generator=g),


cpuhrsch · 2019-03-20T21:01:04Z

I got the following results in comparison to cython (on my machine!) using the following changes

diff --git a/aten/src/ATen/native/SobolEngineOps.cpp b/aten/src/ATen/native/SobolEngineOps.cpp
index e6d32426f..bac74a7c3 100644
--- a/aten/src/ATen/native/SobolEngineOps.cpp
+++ b/aten/src/ATen/native/SobolEngineOps.cpp
@@ -24,15 +24,22 @@ std::tuple<Tensor, Tensor> _sobol_engine_draw(const Tensor& quasi, int64_t n, co
   Tensor result = at::empty({n, dimension}, sobolstate.options().dtype(at::kFloat));

   int64_t l;
-  auto wquasi_accessor = wquasi.accessor<int64_t, 1>();
-  auto sobolstate_accessor = sobolstate.accessor<int64_t, 2>();
-  auto result_accessor = result.accessor<float, 2>();
+  int64_t* wquasi_accessor = wquasi.data<int64_t>();
+  int64_t* sobolstate_accessor = sobolstate.data<int64_t>();
+  assert(wquasi.is_contiguous());
+  assert(sobolstate.is_contiguous());
+  float* result_accessor = result.data<float>();
+  int64_t sobolstate_stride_0 = sobolstate.stride(0);
+  int64_t result_stride_0 = result.stride(0);
   for (int64_t i = 0; i < n; i++, num_generated++) {
     l = rightmost_zero(num_generated);
     for (int64_t j = 0; j < dimension; j++) {
-      wquasi_accessor[j] ^= sobolstate_accessor[j][l];
+      wquasi_accessor[j] ^= sobolstate_accessor[j * sobolstate_stride_0 + l];
+    }
+    for (int64_t j = 0; j < dimension; j++) {
+      result_accessor[i * result_stride_0 + j] = wquasi_accessor[j];
     }
-    result.select(0, i).copy_(wquasi);
+    // result.select(0, i).copy_(wquasi);
   }

$ python test/test_torch.py TestTorch.test_sobolengine_unscrambled_highdim
.
----------------------------------------------------------------------
Ran 1 test in 0.083s

OK
$ python test/test_torch.py TestTorch.test_sobolengine_unscrambled_lowdim
.
----------------------------------------------------------------------
Ran 1 test in 0.042s

OK
$ python test/test_torch.py TestTorch.test_sobolengine_scrambled_lowdim
.
----------------------------------------------------------------------
Ran 1 test in 0.064s

OK
$ python test/test_torch.py TestTorch.test_sobolengine_scrambled_highdim
.
----------------------------------------------------------------------
Ran 1 test in 0.442s

OK
$ numactl --membind 0 --cpubind 0 taskset -c 0 perf stat python ~/tmp/sobol_test.py -d 20 -n 20000 --run-pytorch -r 100
Ran torch Sobol, 100 runs in a total of 0.7056s

 Performance counter stats for 'python ~/tmp/sobol_test.py -d 20 -n 20000 --run-pytorch -r 100':

       1350.881368      task-clock (msec)         #    0.767 CPUs utilized
               298      context-switches          #    0.221 K/sec
                 0      cpu-migrations            #    0.000 K/sec
            19,118      page-faults               #    0.014 M/sec
     2,970,313,434      cycles                    #    2.199 GHz
     4,697,757,385      instructions              #    1.58  insn per cycle
       675,533,253      branches                  #  500.069 M/sec
         8,816,764      branch-misses             #    1.31% of all branches

       1.761854092 seconds time elapsed

$ numactl --membind 0 --cpubind 0 taskset -c 0 python ~/tmp/sobol_test.py -d 20 -n 20000 --run-cython -r 100
Ran Cython Sobol, 100 runs in a total of 0.0976s

Could someone try to reproduce this?

Balandat · 2019-03-20T21:35:42Z

@cpuhrsch I can confirm this runs much faster now, basically on par with cython:

>> python sobol_test.py -d 20 -n 20000
Ran torch Sobol, 100 runs in a total of 0.0983s
Ran Cython Sobol, 100 runs in a total of 0.0803s

Thanks a lot for digging into this! Looks like ATen just has a ton of overhead here compared to pure C.

Balandat · 2019-03-20T22:28:24Z

OK some more testing shows that the results hold up, and that the samples agree (see below).

@vishwakftw, can you incorporate this patch, and also extend the draw method as suggested in #10505 (comment) ?

After that I this is ready to go in!

import torch
from torch.quasirandom import SobolEngine as TorchSobol
from botorch.qmc.normal import SobolEngine

Before changes

tngin = TorchSobol(20, scramble=True)
bngin = SobolEngine(20, scramble=True)

%timeit tngin.draw(10000)
55.2 ms ± 890 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit bngin.draw(10000)
387 µs ± 3.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

bngin = SobolEngine(100, scramble=True)
tngin = TorchSobol(100, scramble=True)

%timeit tngin.draw(200_000)
1.44 s ± 67.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit bngin.draw(200_000)
104 ms ± 815 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using cpuhrsch's pure C implementation

tngin = TorchSobol(20, scramble=True)
bngin = SobolEngine(20, scramble=True)

%timeit tngin.draw(10000)
451 µs ± 8.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit bngin.draw(10000)
384 µs ± 2.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

bngin = SobolEngine(100, scramble=True)
tngin = TorchSobol(100, scramble=True)

%timeit tngin.draw(200_000)
80.8 ms ± 771 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit bngin.draw(200_000)
105 ms ± 1.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Let's make sure the samples agree

tngin = TorchSobol(1, scramble=False)
bngin = SobolEngine(1, scramble=False)

tngin.draw(5).view(-1)
tensor([0.5000, 0.7500, 0.2500, 0.3750, 0.8750])

bngin.draw(5).flatten()
array([0.5  , 0.75 , 0.25 , 0.375, 0.875])

tngin = TorchSobol(4, scramble=False)
bngin = SobolEngine(4, scramble=False)

tngin.draw(3)

tensor([[0.5000, 0.5000, 0.5000, 0.5000],
        [0.7500, 0.2500, 0.7500, 0.2500],
        [0.2500, 0.7500, 0.2500, 0.7500]])

bngin.draw(3)

array([[0.5 , 0.5 , 0.5 , 0.5 ],
       [0.75, 0.25, 0.75, 0.25],
       [0.25, 0.75, 0.25, 0.75]])

tngin = TorchSobol(4, scramble=True, seed=1234)
bngin = SobolEngine(4, scramble=True, seed=1234)

tngin.draw(3)

tensor([[0.6241, 0.6973, 0.8589, 0.0572],
        [0.8461, 0.0990, 0.6488, 0.6185],
        [0.4506, 0.7502, 0.4871, 0.4383]])

bngin.draw(3)

array([[0.62414032, 0.69727999, 0.85888755, 0.05719134],
       [0.84614956, 0.09904676, 0.64880425, 0.6184572 ],
       [0.45061997, 0.75021988, 0.48705986, 0.4382714 ]])

vishwakftw · 2019-03-21T00:18:09Z

@cpuhrsch thank you for looking into this, and for providing the benchmark. I was under the impression that TensorAccessor operated on .data directly, which is why I didn’t think about a data based implementation. Could there be any overheads while dealing with TensorAccessor then?

I believe the select calls were also a bottleneck.

I am now working on including your changes in the PR.

@Balandat thank you for verifying that this works as expected.

Thank you Christian Puhrsch - Add dtype and out arguments

facebook-github-bot

@Balandat has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Balandat

This looks great. Many thanks for all your work on this. This will be super useful.

vishwakftw · 2019-03-21T06:13:32Z

@cpuhrsch @zou3519 do you have any comments on this PR that you like me to address before it gets merged in?

torch/quasirandom.py

Balandat · 2019-03-21T06:41:44Z

Aside: I do have some utilities ready for transforming the Sobol samples into qMC multivariate normal samples (that can all happen from python). I'll put up a PR for that myself.

facebook-github-bot

@Balandat has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

vishwakftw · 2019-03-24T04:20:17Z

@pytorchbot rebase this please

facebook-github-bot

@Balandat has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

vishwakftw · 2019-03-25T13:40:14Z

Test failures are spurious.

vishwakftw · 2019-03-25T13:40:24Z

@pytorchbot merge this please

facebook-github-bot

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: `SobolEngine` is a quasi-random sampler used to sample points evenly between [0,1]. Here we use direction numbers to generate these samples. The maximum supported dimension for the sampler is 1111. Documentation has been added, tests have been added based on Balandat 's references. The implementation is an optimized / tensor-ized implementation of Balandat 's implementation in Cython as provided in #9332. This closes #9332 . cc: soumith Balandat Pull Request resolved: pytorch/pytorch#10505 Reviewed By: zou3519 Differential Revision: D9330179 Pulled By: ezyang fbshipit-source-id: 01d5588e765b33b06febe99348f14d1e7fe8e55d

Summary: Most of this is actually just copying Sobol over. This is meant to be temporary, the goal is to use the torch native version from pytorch/pytorch#10505 once the performance issues have been resolved. The new content is doing qMC for standard Normal RVs, using the Box-Muller transform. Reviewed By: bletham Differential Revision: D13591551 fbshipit-source-id: cd57745ef837b1a88054bb6a55c760b553c144fa

Summary: Simplify installation (and building of cython). For now this requires both `numpy` and `cython` to be available at setup time. We could also ship generated `.c` files in the furture (see e.g. https://cython.readthedocs.io/en/latest/src/userguide/source_files_and_compilation.html#distributing-cython-modules), but that is low-pri given that we want to move to the torch native Sobol engine from pytorch/pytorch#10505 (if that finally makes its way in...) Reviewed By: bkarrer Differential Revision: D13592395 fbshipit-source-id: 4bd2c7fada88dfad2bffce7c8b31dae3fca47e1f

vishwakftw added 16 commits August 11, 2018 15:48

SobolEngine native functions and Python interface

abe55f3

1. Added the native implementations to ATen 2. Created a basic interface for SobolEngine for Python which needs documentation Todo: - Compile (there is a linking issue - Tests

Add documentation to SobolEngine, rename file as quasirandom.py

61f5435

Add tests for Unscrambled Sobol

d108259

Add seeding capability

c5a30ea

Move torch.Generator creation and seeding

5f3d950

Fix compilation, now the code compiles succesfully

0e34fb0

New developments: - Sequences are not the same - Segmentation fault in the scramble function - Chunking of ltm in scramble throws error when using at::native::where

Fix unscrambled sampling

fffc695

Fix tests for unscrambled

81d0e34

Fix segfault and add a few more optimizations

2575b44

Remove unwanted code

79ea72c

Add doc link to the main documentation

04bd78c

Merge branch 'master' into sobol-engine

69644c3

Fix error in scramble, turns out I had taken of the original vector i…

b9b473e

…nstead of its reverse in cdot_pow2

Merge remote-tracking branch 'origin/sobol-engine' into sobol-engine

8fc088f

Added tests for unscrambled

b87d650

Make scramble pretty fast

01e622a

vishwakftw requested review from apaszke, colesbury, ezyang, gchanan, soumith, ssnl, zdevito and zou3519 as code owners August 14, 2018 14:32

vishwakftw mentioned this pull request Aug 14, 2018

[RFC] Introduce SobolEngine vishwakftw/pytorch#4

Closed

vishwakftw commented Aug 14, 2018

View reviewed changes

aten/src/ATen/native/SobolEngineOpsUtils.h

return at::mul(inter, bmat).sum(-1);

}

/// All definitions below this point are data. These are constant, and should not be modified

This comment was marked as off-topic.

Sign in to view

vishwakftw added 3 commits August 14, 2018 20:06

Fix lint

44d84bf

Merge branch 'master' into sobol-engine

a54c4d5

Fix lint

6ef3ed9

vishwakftw commented Aug 14, 2018

View reviewed changes

torch/quasirandom.py Outdated

if self.seed is not None:

g.manual_seed(self.seed)

self.shift = torch.mv(torch.randint(2, (self.dimension, self.MAXBIT), dtype=torch.float, generator=g),

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

Improve draw perf drastically

0131bb2

Thank you Christian Puhrsch - Add dtype and out arguments

facebook-github-bot reviewed Mar 21, 2019

View reviewed changes

Balandat approved these changes Mar 21, 2019

View reviewed changes

Balandat reviewed Mar 21, 2019

View reviewed changes

torch/quasirandom.py Show resolved Hide resolved

Update docstrings to adhere to Google style length

745cd7c

vishwakftw force-pushed the sobol-engine branch from 7726187 to 745cd7c Compare March 21, 2019 14:49

facebook-github-bot reviewed Mar 21, 2019

View reviewed changes

Merge remote-tracking branch 'origin/master' into HEAD

7fbad96

facebook-github-bot reviewed Mar 24, 2019

View reviewed changes

vishwakftw requested a review from zou3519 March 25, 2019 02:17

zou3519 approved these changes Mar 25, 2019

View reviewed changes

pytorchbot added the merge-this-please Was marked for merge with @pytorchbot merge this please label Mar 25, 2019

facebook-github-bot reviewed Mar 25, 2019

View reviewed changes

facebook-github-bot closed this in 5e462a3 Mar 26, 2019

vishwakftw deleted the sobol-engine branch April 8, 2019 04:59

ezyang added open source merged labels Jun 24, 2019

[RFC] Introduce SobolEngine #10505

[RFC] Introduce SobolEngine #10505

Uh oh!

Conversation

vishwakftw commented Aug 14, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

cpuhrsch commented Mar 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Balandat commented Mar 20, 2019

Uh oh!

Balandat commented Mar 20, 2019

Before changes

Using cpuhrsch's pure C implementation

Let's make sure the samples agree

Uh oh!

vishwakftw commented Mar 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Balandat left a comment

Choose a reason for hiding this comment

Uh oh!

vishwakftw commented Mar 21, 2019

Uh oh!

Uh oh!

Balandat commented Mar 21, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

vishwakftw commented Mar 24, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

vishwakftw commented Mar 25, 2019

Uh oh!

vishwakftw commented Mar 25, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

cpuhrsch commented Mar 20, 2019 •

edited

Loading

vishwakftw commented Mar 21, 2019 •

edited

Loading