Skip to content

Conversation

@syed-ahmed
Copy link
Collaborator

@syed-ahmed syed-ahmed commented May 17, 2019

Stack from ghstack:

Differential Revision: D15454050

Effective Bandwidth Benchmark

Float Type

Before:

random, size, elements 65536 forward 5.106925964355469e-06 bandwidth (GB/s) 51.331075059570495
random, size, elements 131072 forward 5.497932434082031e-06 bandwidth (GB/s) 95.36093909592368
random, size, elements 262144 forward 7.791519165039062e-06 bandwidth (GB/s) 134.57914660660956
random, size, elements 524288 forward 1.2221336364746093e-05 bandwidth (GB/s) 171.59760090144363
random, size, elements 1048576 forward 2.0668506622314453e-05 bandwidth (GB/s) 202.93212647844044
random, size, elements 2097152 forward 3.9124488830566405e-05 bandwidth (GB/s) 214.40811754315664
random, size, elements 4194304 forward 7.290840148925782e-05 bandwidth (GB/s) 230.1136173239503
random, size, elements 8388608 forward 0.00013821840286254883 bandwidth (GB/s) 242.76385275098409
random, size, elements 16777216 forward 0.0002722597122192383 bandwidth (GB/s) 246.48841157211064
random, size, elements 33554432 forward 0.0005396437644958496 bandwidth (GB/s) 248.71542456418447

After:

random, size, elements 65536 forward 5.841255187988281e-06 bandwidth (GB/s) 44.878025623510204
random, size, elements 131072 forward 5.857944488525391e-06 bandwidth (GB/s) 89.5003360013024
random, size, elements 262144 forward 6.563663482666016e-06 bandwidth (GB/s) 159.75468620065382
random, size, elements 524288 forward 7.276535034179687e-06 bandwidth (GB/s) 288.207504004194
random, size, elements 1048576 forward 1.0349750518798827e-05 bandwidth (GB/s) 405.2565317764571
random, size, elements 2097152 forward 1.6405582427978516e-05 bandwidth (GB/s) 511.3264364021509
random, size, elements 4194304 forward 2.7208328247070314e-05 bandwidth (GB/s) 616.6206114411497
random, size, elements 8388608 forward 4.884481430053711e-05 bandwidth (GB/s) 686.9599665901694
random, size, elements 16777216 forward 9.639024734497071e-05 bandwidth (GB/s) 696.2204771591086
random, size, elements 33554432 forward 0.00017502307891845704 bandwidth (GB/s) 766.8573129291814

Double Type

Before:

random, size, elements 65536 forward 6.1082839965820315e-06 bandwidth (GB/s) 42.916144721935986
random, size, elements 131072 forward 8.215904235839844e-06 bandwidth (GB/s) 63.81379151340685
random, size, elements 262144 forward 1.3575553894042968e-05 bandwidth (GB/s) 77.240016001124
random, size, elements 524288 forward 2.3760795593261718e-05 bandwidth (GB/s) 88.26101768219948
random, size, elements 1048576 forward 4.4798851013183595e-05 bandwidth (GB/s) 93.62525835240021
random, size, elements 2097152 forward 8.335113525390626e-05 bandwidth (GB/s) 100.64179659276888
random, size, elements 4194304 forward 0.00015572309494018554 bandwidth (GB/s) 107.7374939564633
random, size, elements 8388608 forward 0.0003071308135986328 bandwidth (GB/s) 109.25127181751903
random, size, elements 16777216 forward 0.0006092119216918945 bandwidth (GB/s) 110.15684626398355
random, size, elements 33554432 forward 0.0011054635047912597 bandwidth (GB/s) 121.41307914578674

After:

random, size, elements 65536 forward 5.834102630615234e-06 bandwidth (GB/s) 44.93304567944422
random, size, elements 131072 forward 6.258487701416016e-06 bandwidth (GB/s) 83.77231449721906
random, size, elements 262144 forward 7.848739624023438e-06 bandwidth (GB/s) 133.5980106653706
random, size, elements 524288 forward 1.185894012451172e-05 bandwidth (GB/s) 176.84143591089668
random, size, elements 1048576 forward 2.0167827606201173e-05 bandwidth (GB/s) 207.97004426546874
random, size, elements 2097152 forward 3.463029861450195e-05 bandwidth (GB/s) 242.23319854617557
random, size, elements 4194304 forward 6.528139114379883e-05 bandwidth (GB/s) 256.9984448254775
random, size, elements 8388608 forward 0.00012089729309082031 bandwidth (GB/s) 277.544940355226
random, size, elements 16777216 forward 0.00023464202880859374 bandwidth (GB/s) 286.0053006733214
random, size, elements 33554432 forward 0.00044272661209106447 bandwidth (GB/s) 303.1616449846316

@pytorchbot pytorchbot added module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: operators labels May 17, 2019
@ezyang
Copy link
Contributor

ezyang commented May 17, 2019

Test failures look real:

May 17 07:01:29 ======================================================================
May 17 07:01:29 ERROR: test_nn_scalars_reductions (__main__.TestNN)
May 17 07:01:29 ----------------------------------------------------------------------
May 17 07:01:29 Traceback (most recent call last):
May 17 07:01:29   File "test_nn.py", line 3232, in test_nn_scalars_reductions
May 17 07:01:29     target = torch.empty(input_shape, device=device).random_(2)
May 17 07:01:29 RuntimeError: [from, to] is outside the range of the tensor data type: Double. Expected [from, to] to be inside [2.22507e-308, 1.79769e+308] but found [from, to] to be [0, 2]
May 17 07:01:29 
May 17 07:01:29 ======================================================================
May 17 07:01:29 FAIL: test_InstanceNorm3d_general_cuda (__main__.TestNN)
May 17 07:01:29 ----------------------------------------------------------------------
May 17 07:01:29 Traceback (most recent call last):
May 17 07:01:29   File "/var/lib/jenkins/workspace/test/common_utils.py", line 338, in wrapper
May 17 07:01:29     method(*args, **kwargs)
May 17 07:01:29   File "/var/lib/jenkins/workspace/test/common_utils.py", line 129, in wrapper
May 17 07:01:29     fn(*args, **kwargs)
May 17 07:01:29   File "test_nn.py", line 2990, in test_InstanceNorm3d_general_cuda
May 17 07:01:29     self._test_InstanceNorm_cuda_half(nn.InstanceNorm3d, input)
May 17 07:01:29   File "test_nn.py", line 2928, in _test_InstanceNorm_cuda_half
May 17 07:01:29     self.assertAlmostEqual(cudnn_output, thnn_output, delta=1e-4)
May 17 07:01:29   File "/var/lib/jenkins/workspace/test/common_utils.py", line 516, in assertAlmostEqual
May 17 07:01:29     self.assertEqual(x, y, prec, msg, allow_inf)
May 17 07:01:29   File "/var/lib/jenkins/workspace/test/common_utils.py", line 483, in assertEqual
May 17 07:01:29     assertTensorsEqual(x, y)
May 17 07:01:29   File "/var/lib/jenkins/workspace/test/common_utils.py", line 475, in assertTensorsEqual
May 17 07:01:29     self.assertLessEqual(max_err, prec, message)
May 17 07:01:29 AssertionError: tensor(0.0002, device='cuda:0', dtype=torch.float16, grad_fn=<MaxBackward1>) not less than or equal to 0.0001

}
}

TEST(DistributionsTest, TestPhiloxIncrementSmallTensor) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a substantive change to this file, or did you just refactor some duplicated code into a function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just refactored duplicated code. Nothing else changed.

}

Tensor& clamped_random_cuda_(Tensor& self, int64_t from, int64_t to, Generator* gen) {
AT_CHECK(from < to, "random_ expects 'from' to be less than 'to', but got from=", from, " >= to=", to);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TORCH_CHECK now please! (I won't mention on other sites.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

Tensor& clamped_random_cuda_(Tensor& self, int64_t from, int64_t to, Generator* gen) {
AT_CHECK(from < to, "random_ expects 'from' to be less than 'to', but got from=", from, " >= to=", to);
auto iter = TensorIterator::nullary_op(self);
AT_DISPATCH_ALL_TYPES_AND(at::ScalarType::Half, iter->dtype(), "range_check", [&] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man, this dispatch /just/ to do error checking makes my eyes bleed, but I can't think of a convenient way to do this otherwise, so fine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this range_check btw because of bool tensors and that we don't seem to guarantee this condition (didn't had this check before and the following says "usually"):

random_(from=0, to=None, *, generator=None) → Tensor
Fills self tensor with numbers sampled from the discrete uniform distribution over [from, to - 1]. If not specified, the values are usually only bounded by self tensor’s data type. However, for floating point types, if unspecified, range will be [0, 2^mantissa] to ensure that every value is representable. For example, torch.tensor(1, dtype=torch.double).random_() will be uniform in [0, 2^53].

We should have this discussion in #16944. As pointed by @ngimel, following is the behavior in numpy:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-587d8ebfc2e9> in <module>
----> 1 np.random.randint(0,500, size=100, dtype=np.int8)

mtrand.pyx in mtrand.RandomState.randint()

ValueError: high is out of bounds for int8

Same for bool

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-8c6e78ec7bf7> in <module>
----> 1 np.random.randint(0,3, size=100, dtype=np.bool)

mtrand.pyx in mtrand.RandomState.randint()

ValueError: high is out of bounds for bool

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this discussion belongs in #16944. #16944 is about valid arguments to random_, that are not working currently, and the reasons for them not working are clear.
Things we need to decide (and possibly change behavior) are what happens when random_ is called with no arguments - right now behavior is inconsistent. The docs say the values are usually only bounded by self tensor’s data type but try generating negative numbers for int tensor. My suggestion here is to use full range, but #16944 will have to be fixed for this. I'm also not against documenting and preserving current behavior. Also, what happens when random_ arguments fall outside numeric range of source tensor - my suggestion is to follow numpy and throw an error.

", ", std::numeric_limits<scalar_t>::max(), "]",
" but found [from, to] to be [", from, ", ", to, "]");
});
auto range = static_cast<uint64_t>(to) - static_cast<uint64_t>(from);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional static casts here look kind of questionable. Are you assuming that to and from are non-negative? I don't see any check to this effect. If to/from could be negative, casting them to unsigned is surprising (it might work out in the end, but I don't want to have to work that out in my head without a comment.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code did max_val - min_val, without the casts. Now, the promotion rules make my head hurt, but what I think happens is you do the subtraction on int64_t, and then convert the (now known to be positive) int64_t into a uint64_t. So it seems the old code did something different.

Copy link
Collaborator Author

@syed-ahmed syed-ahmed May 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I goofed here may be in an attempt to fix this bug: #16944. Reverted to what it was before for now. I'll fix that bug separately in a different PR when I come to a conclusion about this: #16944 (comment).

auto gen = check_generator<CUDAGenerator>(gen_, &globalContext().defaultGenerator(kCUDA));
AT_DISPATCH_ALL_TYPES_AND(at::ScalarType::Half, iter.dtype(), "random_cuda", [&] {
using accscalar_t = at::acc_type<scalar_t, true>;
if (std::is_same<scalar_t, double>::value || std::is_same<scalar_t, int64_t>::value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider using template specializations to do this in the future. You're generating dead code when you do it this way; fortunately that code type checks and will get DCE'd later... but best not to generate it at all in the first place.

if (std::is_same<scalar_t, double>::value || std::is_same<scalar_t, int64_t>::value) {
// define lambda to mod with range and add base
auto random_func = [range, base] __device__ (uint64_t rand) {
return static_cast<scalar_t>(rand % range + base);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh heh, modulo isn't uniform, but we're unlikely to switch this to rejection sampling. No changes needed, but it did look funny to see :>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah :). Will leave that discussion for another day.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 orthogonal problems with modulo

  1. when range is comparable to the range produced by generator, modulo indeed produces non-uniform results. Usually not a problem though, because in typical usecases range is much less than range of random numbers produced
  2. modulo essentially extracts lower bits, and lower bits have low entropy. That's true for lcg's, but we are using philox here, so should be fine.

@ezyang
Copy link
Contributor

ezyang commented May 17, 2019

Fix the errors. My plan is to review the other diffs in the stack when we get this one green; let me know if you want review earlier.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI fails

Move THCTensor_{random, clampedRandom, cappedRandom} to ATen

gh-metadata: pytorch pytorch 20620 gh/syed-ahmed/1/head
Move THCTensor_{random, clampedRandom, cappedRandom} to ATen

gh-metadata: pytorch pytorch 20620 gh/syed-ahmed/1/head
Move THCTensor_{random, clampedRandom, cappedRandom} to ATen

gh-metadata: pytorch pytorch 20620 gh/syed-ahmed/1/head
Move THCTensor_{random, clampedRandom, cappedRandom} to ATen

gh-metadata: pytorch pytorch 20620 gh/syed-ahmed/1/head
Move THCTensor_{random, clampedRandom, cappedRandom} to ATen

gh-metadata: pytorch pytorch 20620 gh/syed-ahmed/1/head
@pytorchbot pytorchbot added the module: rocm AMD GPU support for Pytorch label May 21, 2019
Move THCTensor_{random, clampedRandom, cappedRandom} to ATen

gh-metadata: pytorch pytorch 20620 gh/syed-ahmed/1/head
@syed-ahmed syed-ahmed requested a review from ezyang May 22, 2019 07:31
Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@syed-ahmed
Copy link
Collaborator Author

@ezyang @iotamudelta Any idea why these are failing for rocm?

04:32:37 ======================================================================
04:32:37 ERROR: test___std_mean__ (__main__.TestAutograd)
04:32:37 ----------------------------------------------------------------------
04:32:37 Traceback (most recent call last):
04:32:37   File "test_autograd.py", line 3255, in do_test
04:32:37     check(name)
04:32:37   File "test_autograd.py", line 3175, in check
04:32:37     output_variable = getattr(self_variable, name)(*args_variable, **kwargs_variable)
04:32:37 AttributeError: 'Tensor' object has no attribute '__std_mean__'
04:32:37 
04:32:37 ======================================================================
04:32:37 ERROR: test___std_mean___dim (__main__.TestAutograd)
04:32:37 ----------------------------------------------------------------------
04:32:37 Traceback (most recent call last):
04:32:37   File "test_autograd.py", line 3255, in do_test
04:32:37     check(name)
04:32:37   File "test_autograd.py", line 3175, in check
04:32:37     output_variable = getattr(self_variable, name)(*args_variable, **kwargs_variable)
04:32:37 AttributeError: 'Tensor' object has no attribute '__std_mean__'
04:32:37 
04:32:37 ======================================================================
04:32:37 ERROR: test___std_mean___dim_1d (__main__.TestAutograd)
04:32:37 ----------------------------------------------------------------------
04:32:37 Traceback (most recent call last):
04:32:37   File "test_autograd.py", line 3255, in do_test
04:32:37     check(name)
04:32:37   File "test_autograd.py", line 3175, in check
04:32:37     output_variable = getattr(self_variable, name)(*args_variable, **kwargs_variable)
04:32:37 AttributeError: 'Tensor' object has no attribute '__std_mean__'
04:32:37 
04:32:37 ======================================================================
04:32:37 ERROR: test___std_mean___keepdim_dim (__main__.TestAutograd)
04:32:37 ----------------------------------------------------------------------
04:32:37 Traceback (most recent call last):
04:32:37   File "test_autograd.py", line 3255, in do_test
04:32:37     check(name)
04:32:37   File "test_autograd.py", line 3175, in check
04:32:37     output_variable = getattr(self_variable, name)(*args_variable, **kwargs_variable)
04:32:37 AttributeError: 'Tensor' object has no attribute '__std_mean__'
04:32:37 
04:32:37 ======================================================================
04:32:37 ERROR: test___std_mean___keepdim_dim_1d (__main__.TestAutograd)
04:32:37 ----------------------------------------------------------------------
04:32:37 Traceback (most recent call last):
04:32:37   File "test_autograd.py", line 3255, in do_test
04:32:37     check(name)
04:32:37   File "test_autograd.py", line 3175, in check
04:32:37     output_variable = getattr(self_variable, name)(*args_variable, **kwargs_variable)
04:32:37 AttributeError: 'Tensor' object has no attribute '__std_mean__'
04:32:37 
04:32:37 ======================================================================
04:32:37 ERROR: test___var_mean__ (__main__.TestAutograd)
04:32:37 ----------------------------------------------------------------------
04:32:37 Traceback (most recent call last):
04:32:37   File "test_autograd.py", line 3255, in do_test
04:32:37     check(name)
04:32:37   File "test_autograd.py", line 3175, in check
04:32:37     output_variable = getattr(self_variable, name)(*args_variable, **kwargs_variable)
04:32:37 AttributeError: 'Tensor' object has no attribute '__var_mean__'
04:32:37 
04:32:37 ======================================================================
04:32:37 ERROR: test___var_mean___dim (__main__.TestAutograd)
04:32:37 ----------------------------------------------------------------------
04:32:37 Traceback (most recent call last):
04:32:37   File "test_autograd.py", line 3255, in do_test
04:32:37     check(name)
04:32:37   File "test_autograd.py", line 3175, in check
04:32:37     output_variable = getattr(self_variable, name)(*args_variable, **kwargs_variable)
04:32:37 AttributeError: 'Tensor' object has no attribute '__var_mean__'
04:32:37 
04:32:37 ======================================================================
04:32:37 ERROR: test___var_mean___dim_1d (__main__.TestAutograd)
04:32:37 ----------------------------------------------------------------------
04:32:37 Traceback (most recent call last):
04:32:37   File "test_autograd.py", line 3255, in do_test
04:32:37     check(name)
04:32:37   File "test_autograd.py", line 3175, in check
04:32:37     output_variable = getattr(self_variable, name)(*args_variable, **kwargs_variable)
04:32:37 AttributeError: 'Tensor' object has no attribute '__var_mean__'
04:32:37 
04:32:37 ======================================================================
04:32:37 ERROR: test___var_mean___keepdim_dim (__main__.TestAutograd)
04:32:37 ----------------------------------------------------------------------
04:32:37 Traceback (most recent call last):
04:32:37   File "test_autograd.py", line 3255, in do_test
04:32:37     check(name)
04:32:37   File "test_autograd.py", line 3175, in check
04:32:37     output_variable = getattr(self_variable, name)(*args_variable, **kwargs_variable)
04:32:37 AttributeError: 'Tensor' object has no attribute '__var_mean__'
04:32:37 
04:32:37 ======================================================================
04:32:37 ERROR: test___var_mean___keepdim_dim_1d (__main__.TestAutograd)
04:32:37 ----------------------------------------------------------------------
04:32:37 Traceback (most recent call last):
04:32:37   File "test_autograd.py", line 3255, in do_test
04:32:37     check(name)
04:32:37   File "test_autograd.py", line 3175, in check
04:32:37     output_variable = getattr(self_variable, name)(*args_variable, **kwargs_variable)
04:32:37 AttributeError: 'Tensor' object has no attribute '__var_mean__'
04:32:37 
04:32:37 ======================================================================
04:32:37 ERROR: test_profiler_aggregation_lstm (__main__.TestAutograd)
04:32:37 ----------------------------------------------------------------------
04:32:37 Traceback (most recent call last):
04:32:37   File "test_autograd.py", line 2498, in test_profiler_aggregation_lstm
04:32:37     with profile(record_shapes=True) as prof:
04:32:37 TypeError: __init__() got an unexpected keyword argument 'record_shapes'
04:32:37 
04:32:37 ======================================================================
04:32:37 ERROR: test_profiler_shapes (__main__.TestAutograd)
04:32:37 ----------------------------------------------------------------------
04:32:37 Traceback (most recent call last):
04:32:37   File "test_autograd.py", line 2476, in test_profiler_shapes
04:32:37     with profile(record_shapes=True) as prof:
04:32:37 TypeError: __init__() got an unexpected keyword argument 'record_shapes'
04:32:37 
04:32:37 ======================================================================
04:32:37 ERROR: test_var_mean_differentiable (__main__.TestAutograd)
04:32:37 ----------------------------------------------------------------------
04:32:37 Traceback (most recent call last):
04:32:37   File "test_autograd.py", line 2170, in test_var_mean_differentiable
04:32:37     var1, mean1 = torch.var_mean(input1, dim=dim, keepdim=keepdim)
04:32:37 AttributeError: 'module' object has no attribute 'var_mean'
04:32:37 
04:32:37 ======================================================================
04:32:37 FAIL: test_version_counter (__main__.TestAutograd)
04:32:37 ----------------------------------------------------------------------
04:32:37 Traceback (most recent call last):
04:32:37   File "test_autograd.py", line 3048, in test_version_counter
04:32:37     self.assertTrue(x._version == xz._version)
04:32:37 AssertionError: False is not true
04:32:37 
04:32:37 ----------------------------------------------------------------------
04:32:37 Ran 949 tests in 253.074s
04:32:37 
04:32:37 FAILED (failures=1, errors=13, skipped=13)
04:32:37 
04:32:37 
04:32:37 Traceback (most recent call last):
04:32:37   File "test/run_test.py", line 440, in <module>
04:32:37     main()
04:32:37   File "test/run_test.py", line 432, in main
04:32:37     raise RuntimeError(message)
04:32:37 RuntimeError: test_autograd failed!

@syed-ahmed
Copy link
Collaborator Author

syed-ahmed commented May 23, 2019

Looks like the rocm failure is not related to this PR (although looked like it was related because var_mean, std_mean): Similar failure in #20801 (comment) as well

@zou3519 zou3519 deleted the gh/syed-ahmed/1/head branch May 23, 2019 20:47
zdevito pushed a commit to zdevito/ATen that referenced this pull request May 23, 2019
Summary:
Pull Request resolved: pytorch/pytorch#20620
ghimport-source-id: 7c09c2462021e3fa5adef61570a575964ff16125

Differential Revision: D15454050

Pulled By: ezyang

fbshipit-source-id: 5b0421c56445baf19dbdbdd9680af128a5cdf443
@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in b6d0f6c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: rocm AMD GPU support for Pytorch open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants