Skip to content

Conversation

@vishwakftw
Copy link
Contributor

SobolEngine is a quasi-random sampler used to sample points evenly between [0,1]. Here we use direction numbers to generate these samples. The maximum supported dimension for the sampler is 1111.

Documentation has been added, tests have been added based on @Balandat 's references. The implementation is an optimized / tensor-ized implementation of @Balandat 's implementation in Cython as provided in #9332.

This closes #9332 .

cc: @soumith @Balandat

return at::mul(inter, bmat).sum(-1);
}

/// All definitions below this point are data. These are constant, and should not be modified

This comment was marked as off-topic.

if self.seed is not None:
g.manual_seed(self.seed)

self.shift = torch.mv(torch.randint(2, (self.dimension, self.MAXBIT), dtype=torch.float, generator=g),

This comment was marked as off-topic.

This comment was marked as off-topic.

@cpuhrsch
Copy link
Contributor

cpuhrsch commented Mar 20, 2019

I got the following results in comparison to cython (on my machine!) using the following changes

diff --git a/aten/src/ATen/native/SobolEngineOps.cpp b/aten/src/ATen/native/SobolEngineOps.cpp
index e6d32426f..bac74a7c3 100644
--- a/aten/src/ATen/native/SobolEngineOps.cpp
+++ b/aten/src/ATen/native/SobolEngineOps.cpp
@@ -24,15 +24,22 @@ std::tuple<Tensor, Tensor> _sobol_engine_draw(const Tensor& quasi, int64_t n, co
   Tensor result = at::empty({n, dimension}, sobolstate.options().dtype(at::kFloat));

   int64_t l;
-  auto wquasi_accessor = wquasi.accessor<int64_t, 1>();
-  auto sobolstate_accessor = sobolstate.accessor<int64_t, 2>();
-  auto result_accessor = result.accessor<float, 2>();
+  int64_t* wquasi_accessor = wquasi.data<int64_t>();
+  int64_t* sobolstate_accessor = sobolstate.data<int64_t>();
+  assert(wquasi.is_contiguous());
+  assert(sobolstate.is_contiguous());
+  float* result_accessor = result.data<float>();
+  int64_t sobolstate_stride_0 = sobolstate.stride(0);
+  int64_t result_stride_0 = result.stride(0);
   for (int64_t i = 0; i < n; i++, num_generated++) {
     l = rightmost_zero(num_generated);
     for (int64_t j = 0; j < dimension; j++) {
-      wquasi_accessor[j] ^= sobolstate_accessor[j][l];
+      wquasi_accessor[j] ^= sobolstate_accessor[j * sobolstate_stride_0 + l];
+    }
+    for (int64_t j = 0; j < dimension; j++) {
+      result_accessor[i * result_stride_0 + j] = wquasi_accessor[j];
     }
-    result.select(0, i).copy_(wquasi);
+    // result.select(0, i).copy_(wquasi);
   }
$ python test/test_torch.py TestTorch.test_sobolengine_unscrambled_highdim
.
----------------------------------------------------------------------
Ran 1 test in 0.083s

OK
$ python test/test_torch.py TestTorch.test_sobolengine_unscrambled_lowdim
.
----------------------------------------------------------------------
Ran 1 test in 0.042s

OK
$ python test/test_torch.py TestTorch.test_sobolengine_scrambled_lowdim
.
----------------------------------------------------------------------
Ran 1 test in 0.064s

OK
$ python test/test_torch.py TestTorch.test_sobolengine_scrambled_highdim
.
----------------------------------------------------------------------
Ran 1 test in 0.442s

OK
$ numactl --membind 0 --cpubind 0 taskset -c 0 perf stat python ~/tmp/sobol_test.py -d 20 -n 20000 --run-pytorch -r 100
Ran torch Sobol, 100 runs in a total of 0.7056s

 Performance counter stats for 'python ~/tmp/sobol_test.py -d 20 -n 20000 --run-pytorch -r 100':

       1350.881368      task-clock (msec)         #    0.767 CPUs utilized
               298      context-switches          #    0.221 K/sec
                 0      cpu-migrations            #    0.000 K/sec
            19,118      page-faults               #    0.014 M/sec
     2,970,313,434      cycles                    #    2.199 GHz
     4,697,757,385      instructions              #    1.58  insn per cycle
       675,533,253      branches                  #  500.069 M/sec
         8,816,764      branch-misses             #    1.31% of all branches

       1.761854092 seconds time elapsed

$ numactl --membind 0 --cpubind 0 taskset -c 0 python ~/tmp/sobol_test.py -d 20 -n 20000 --run-cython -r 100
Ran Cython Sobol, 100 runs in a total of 0.0976s

Could someone try to reproduce this?

@Balandat
Copy link
Contributor

@cpuhrsch I can confirm this runs much faster now, basically on par with cython:

>> python sobol_test.py -d 20 -n 20000
Ran torch Sobol, 100 runs in a total of 0.0983s
Ran Cython Sobol, 100 runs in a total of 0.0803s

Thanks a lot for digging into this! Looks like ATen just has a ton of overhead here compared to pure C.

@Balandat
Copy link
Contributor

OK some more testing shows that the results hold up, and that the samples agree (see below).

@vishwakftw, can you incorporate this patch, and also extend the draw method as suggested in #10505 (comment) ?

After that I this is ready to go in!

import torch
from torch.quasirandom import SobolEngine as TorchSobol
from botorch.qmc.normal import SobolEngine

Before changes

tngin = TorchSobol(20, scramble=True)
bngin = SobolEngine(20, scramble=True)
%timeit tngin.draw(10000)
55.2 ms ± 890 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit bngin.draw(10000)
387 µs ± 3.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
bngin = SobolEngine(100, scramble=True)
tngin = TorchSobol(100, scramble=True)
%timeit tngin.draw(200_000)
1.44 s ± 67.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit bngin.draw(200_000)
104 ms ± 815 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using cpuhrsch's pure C implementation

tngin = TorchSobol(20, scramble=True)
bngin = SobolEngine(20, scramble=True)
%timeit tngin.draw(10000)
451 µs ± 8.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit bngin.draw(10000)
384 µs ± 2.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
bngin = SobolEngine(100, scramble=True)
tngin = TorchSobol(100, scramble=True)
%timeit tngin.draw(200_000)
80.8 ms ± 771 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit bngin.draw(200_000)
105 ms ± 1.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Let's make sure the samples agree

tngin = TorchSobol(1, scramble=False)
bngin = SobolEngine(1, scramble=False)
tngin.draw(5).view(-1)
tensor([0.5000, 0.7500, 0.2500, 0.3750, 0.8750])
bngin.draw(5).flatten()
array([0.5  , 0.75 , 0.25 , 0.375, 0.875])
tngin = TorchSobol(4, scramble=False)
bngin = SobolEngine(4, scramble=False)
tngin.draw(3)
tensor([[0.5000, 0.5000, 0.5000, 0.5000],
        [0.7500, 0.2500, 0.7500, 0.2500],
        [0.2500, 0.7500, 0.2500, 0.7500]])
bngin.draw(3)
array([[0.5 , 0.5 , 0.5 , 0.5 ],
       [0.75, 0.25, 0.75, 0.25],
       [0.25, 0.75, 0.25, 0.75]])
tngin = TorchSobol(4, scramble=True, seed=1234)
bngin = SobolEngine(4, scramble=True, seed=1234)
tngin.draw(3)
tensor([[0.6241, 0.6973, 0.8589, 0.0572],
        [0.8461, 0.0990, 0.6488, 0.6185],
        [0.4506, 0.7502, 0.4871, 0.4383]])
bngin.draw(3)
array([[0.62414032, 0.69727999, 0.85888755, 0.05719134],
       [0.84614956, 0.09904676, 0.64880425, 0.6184572 ],
       [0.45061997, 0.75021988, 0.48705986, 0.4382714 ]])

@vishwakftw
Copy link
Contributor Author

vishwakftw commented Mar 21, 2019

@cpuhrsch thank you for looking into this, and for providing the benchmark. I was under the impression that TensorAccessor operated on .data directly, which is why I didn’t think about a data based implementation. Could there be any overheads while dealing with TensorAccessor then?

I believe the select calls were also a bottleneck.

I am now working on including your changes in the PR.

@Balandat thank you for verifying that this works as expected.

Thank you Christian Puhrsch

- Add dtype and out arguments
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Balandat has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@Balandat Balandat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Many thanks for all your work on this. This will be super useful.

@vishwakftw
Copy link
Contributor Author

@cpuhrsch @zou3519 do you have any comments on this PR that you like me to address before it gets merged in?

@Balandat
Copy link
Contributor

Aside: I do have some utilities ready for transforming the Sobol samples into qMC multivariate normal samples (that can all happen from python). I'll put up a PR for that myself.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Balandat has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@vishwakftw
Copy link
Contributor Author

@pytorchbot rebase this please

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Balandat has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@vishwakftw vishwakftw requested a review from zou3519 March 25, 2019 02:17
@vishwakftw
Copy link
Contributor Author

Test failures are spurious.

@vishwakftw
Copy link
Contributor Author

@pytorchbot merge this please

@pytorchbot pytorchbot added the merge-this-please Was marked for merge with @pytorchbot merge this please label Mar 25, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Mar 26, 2019
Summary:
`SobolEngine` is a quasi-random sampler used to sample points evenly between [0,1]. Here we use direction numbers to generate these samples. The maximum supported dimension for the sampler is 1111.

Documentation has been added, tests have been added based on Balandat 's references. The implementation is an optimized / tensor-ized implementation of Balandat 's implementation in Cython as provided in #9332.

This closes #9332 .

cc: soumith Balandat
Pull Request resolved: pytorch/pytorch#10505

Reviewed By: zou3519

Differential Revision: D9330179

Pulled By: ezyang

fbshipit-source-id: 01d5588e765b33b06febe99348f14d1e7fe8e55d
@vishwakftw vishwakftw deleted the sobol-engine branch April 8, 2019 04:59
Balandat added a commit to meta-pytorch/botorch that referenced this pull request Apr 26, 2019
Summary:
Most of this is actually just copying Sobol over. This is meant to be temporary, the goal is to use the torch native version from pytorch/pytorch#10505 once the performance issues have been resolved.

The new content is doing qMC for standard Normal RVs, using the Box-Muller transform.

Reviewed By: bletham

Differential Revision: D13591551

fbshipit-source-id: cd57745ef837b1a88054bb6a55c760b553c144fa
Balandat added a commit to meta-pytorch/botorch that referenced this pull request Apr 26, 2019
Summary:
Simplify installation (and building of cython).

For now this requires both `numpy` and `cython` to be available at setup time. We could also ship generated `.c` files in the furture (see e.g.
https://cython.readthedocs.io/en/latest/src/userguide/source_files_and_compilation.html#distributing-cython-modules), but that is low-pri given that we want to move to the torch native Sobol engine from pytorch/pytorch#10505 (if that finally makes its way in...)

Reviewed By: bkarrer

Differential Revision: D13592395

fbshipit-source-id: 4bd2c7fada88dfad2bffce7c8b31dae3fca47e1f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-this-please Was marked for merge with @pytorchbot merge this please open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feature request] Low-discrepancy quasi-random sampler (Sobol sequences)

9 participants