Move THCTensor_{random, clampedRandom, cappedRandom} to ATen #20620

syed-ahmed · 2019-05-17T04:06:30Z

Stack from ghstack:

Speedup bernoulli_scalar_cuda_kernel with grid-stride loop #20626 Speedup bernoulli_scalar_cuda_kernel with grid-stride loop
Move THCTensor_(geometric) to ATen #20625 Move THCTensor_(geometric) to ATen
Move THCTensor_(lognormal) to ATen #20624 Move THCTensor_(lognormal) to ATen
Move THCTensor_(exponential) to ATen #20623 Move THCTensor_(exponential) to ATen
Move THCTensor_(cauchy) to ATen #20622 Move THCTensor_(cauchy) to ATen
Move THCTensor_{normal, normal_means, normal_stddevs, normal_means_stddevs} to ATen #20621 Move THCTensor_{normal, normal_means, normal_stddevs, normal_means_stddevs} to ATen
Move THCTensor_{random, clampedRandom, cappedRandom} to ATen #20620 Move THCTensor_{random, clampedRandom, cappedRandom} to ATen

Differential Revision: D15454050

Effective Bandwidth Benchmark

Float Type

Before:

random, size, elements 65536 forward 5.106925964355469e-06 bandwidth (GB/s) 51.331075059570495
random, size, elements 131072 forward 5.497932434082031e-06 bandwidth (GB/s) 95.36093909592368
random, size, elements 262144 forward 7.791519165039062e-06 bandwidth (GB/s) 134.57914660660956
random, size, elements 524288 forward 1.2221336364746093e-05 bandwidth (GB/s) 171.59760090144363
random, size, elements 1048576 forward 2.0668506622314453e-05 bandwidth (GB/s) 202.93212647844044
random, size, elements 2097152 forward 3.9124488830566405e-05 bandwidth (GB/s) 214.40811754315664
random, size, elements 4194304 forward 7.290840148925782e-05 bandwidth (GB/s) 230.1136173239503
random, size, elements 8388608 forward 0.00013821840286254883 bandwidth (GB/s) 242.76385275098409
random, size, elements 16777216 forward 0.0002722597122192383 bandwidth (GB/s) 246.48841157211064
random, size, elements 33554432 forward 0.0005396437644958496 bandwidth (GB/s) 248.71542456418447

After:

random, size, elements 65536 forward 5.841255187988281e-06 bandwidth (GB/s) 44.878025623510204
random, size, elements 131072 forward 5.857944488525391e-06 bandwidth (GB/s) 89.5003360013024
random, size, elements 262144 forward 6.563663482666016e-06 bandwidth (GB/s) 159.75468620065382
random, size, elements 524288 forward 7.276535034179687e-06 bandwidth (GB/s) 288.207504004194
random, size, elements 1048576 forward 1.0349750518798827e-05 bandwidth (GB/s) 405.2565317764571
random, size, elements 2097152 forward 1.6405582427978516e-05 bandwidth (GB/s) 511.3264364021509
random, size, elements 4194304 forward 2.7208328247070314e-05 bandwidth (GB/s) 616.6206114411497
random, size, elements 8388608 forward 4.884481430053711e-05 bandwidth (GB/s) 686.9599665901694
random, size, elements 16777216 forward 9.639024734497071e-05 bandwidth (GB/s) 696.2204771591086
random, size, elements 33554432 forward 0.00017502307891845704 bandwidth (GB/s) 766.8573129291814

Double Type

Before:

random, size, elements 65536 forward 6.1082839965820315e-06 bandwidth (GB/s) 42.916144721935986
random, size, elements 131072 forward 8.215904235839844e-06 bandwidth (GB/s) 63.81379151340685
random, size, elements 262144 forward 1.3575553894042968e-05 bandwidth (GB/s) 77.240016001124
random, size, elements 524288 forward 2.3760795593261718e-05 bandwidth (GB/s) 88.26101768219948
random, size, elements 1048576 forward 4.4798851013183595e-05 bandwidth (GB/s) 93.62525835240021
random, size, elements 2097152 forward 8.335113525390626e-05 bandwidth (GB/s) 100.64179659276888
random, size, elements 4194304 forward 0.00015572309494018554 bandwidth (GB/s) 107.7374939564633
random, size, elements 8388608 forward 0.0003071308135986328 bandwidth (GB/s) 109.25127181751903
random, size, elements 16777216 forward 0.0006092119216918945 bandwidth (GB/s) 110.15684626398355
random, size, elements 33554432 forward 0.0011054635047912597 bandwidth (GB/s) 121.41307914578674

After:

random, size, elements 65536 forward 5.834102630615234e-06 bandwidth (GB/s) 44.93304567944422
random, size, elements 131072 forward 6.258487701416016e-06 bandwidth (GB/s) 83.77231449721906
random, size, elements 262144 forward 7.848739624023438e-06 bandwidth (GB/s) 133.5980106653706
random, size, elements 524288 forward 1.185894012451172e-05 bandwidth (GB/s) 176.84143591089668
random, size, elements 1048576 forward 2.0167827606201173e-05 bandwidth (GB/s) 207.97004426546874
random, size, elements 2097152 forward 3.463029861450195e-05 bandwidth (GB/s) 242.23319854617557
random, size, elements 4194304 forward 6.528139114379883e-05 bandwidth (GB/s) 256.9984448254775
random, size, elements 8388608 forward 0.00012089729309082031 bandwidth (GB/s) 277.544940355226
random, size, elements 16777216 forward 0.00023464202880859374 bandwidth (GB/s) 286.0053006733214
random, size, elements 33554432 forward 0.00044272661209106447 bandwidth (GB/s) 303.1616449846316

ezyang · 2019-05-17T14:12:55Z

Test failures look real:

May 17 07:01:29 ======================================================================
May 17 07:01:29 ERROR: test_nn_scalars_reductions (__main__.TestNN)
May 17 07:01:29 ----------------------------------------------------------------------
May 17 07:01:29 Traceback (most recent call last):
May 17 07:01:29   File "test_nn.py", line 3232, in test_nn_scalars_reductions
May 17 07:01:29     target = torch.empty(input_shape, device=device).random_(2)
May 17 07:01:29 RuntimeError: [from, to] is outside the range of the tensor data type: Double. Expected [from, to] to be inside [2.22507e-308, 1.79769e+308] but found [from, to] to be [0, 2]
May 17 07:01:29 
May 17 07:01:29 ======================================================================
May 17 07:01:29 FAIL: test_InstanceNorm3d_general_cuda (__main__.TestNN)
May 17 07:01:29 ----------------------------------------------------------------------
May 17 07:01:29 Traceback (most recent call last):
May 17 07:01:29   File "/var/lib/jenkins/workspace/test/common_utils.py", line 338, in wrapper
May 17 07:01:29     method(*args, **kwargs)
May 17 07:01:29   File "/var/lib/jenkins/workspace/test/common_utils.py", line 129, in wrapper
May 17 07:01:29     fn(*args, **kwargs)
May 17 07:01:29   File "test_nn.py", line 2990, in test_InstanceNorm3d_general_cuda
May 17 07:01:29     self._test_InstanceNorm_cuda_half(nn.InstanceNorm3d, input)
May 17 07:01:29   File "test_nn.py", line 2928, in _test_InstanceNorm_cuda_half
May 17 07:01:29     self.assertAlmostEqual(cudnn_output, thnn_output, delta=1e-4)
May 17 07:01:29   File "/var/lib/jenkins/workspace/test/common_utils.py", line 516, in assertAlmostEqual
May 17 07:01:29     self.assertEqual(x, y, prec, msg, allow_inf)
May 17 07:01:29   File "/var/lib/jenkins/workspace/test/common_utils.py", line 483, in assertEqual
May 17 07:01:29     assertTensorsEqual(x, y)
May 17 07:01:29   File "/var/lib/jenkins/workspace/test/common_utils.py", line 475, in assertTensorsEqual
May 17 07:01:29     self.assertLessEqual(max_err, prec, message)
May 17 07:01:29 AssertionError: tensor(0.0002, device='cuda:0', dtype=torch.float16, grad_fn=<MaxBackward1>) not less than or equal to 0.0001

ezyang · 2019-05-17T14:56:09Z

aten/src/ATen/test/cuda_distributions_test.cu

  }
 }

-TEST(DistributionsTest, TestPhiloxIncrementSmallTensor) {


Was there a substantive change to this file, or did you just refactor some duplicated code into a function?

Just refactored duplicated code. Nothing else changed.

ezyang · 2019-05-17T15:00:02Z