-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Move THCTensor_{random, clampedRandom, cappedRandom} to ATen #20620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test failures look real: |
| } | ||
| } | ||
|
|
||
| TEST(DistributionsTest, TestPhiloxIncrementSmallTensor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a substantive change to this file, or did you just refactor some duplicated code into a function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just refactored duplicated code. Nothing else changed.
| } | ||
|
|
||
| Tensor& clamped_random_cuda_(Tensor& self, int64_t from, int64_t to, Generator* gen) { | ||
| AT_CHECK(from < to, "random_ expects 'from' to be less than 'to', but got from=", from, " >= to=", to); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TORCH_CHECK now please! (I won't mention on other sites.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
| Tensor& clamped_random_cuda_(Tensor& self, int64_t from, int64_t to, Generator* gen) { | ||
| AT_CHECK(from < to, "random_ expects 'from' to be less than 'to', but got from=", from, " >= to=", to); | ||
| auto iter = TensorIterator::nullary_op(self); | ||
| AT_DISPATCH_ALL_TYPES_AND(at::ScalarType::Half, iter->dtype(), "range_check", [&] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Man, this dispatch /just/ to do error checking makes my eyes bleed, but I can't think of a convenient way to do this otherwise, so fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this range_check btw because of bool tensors and that we don't seem to guarantee this condition (didn't had this check before and the following says "usually"):
random_(from=0, to=None, *, generator=None) → Tensor
Fills self tensor with numbers sampled from the discrete uniform distribution over [from, to - 1]. If not specified, the values are usually only bounded by self tensor’s data type. However, for floating point types, if unspecified, range will be [0, 2^mantissa] to ensure that every value is representable. For example, torch.tensor(1, dtype=torch.double).random_() will be uniform in [0, 2^53].
We should have this discussion in #16944. As pointed by @ngimel, following is the behavior in numpy:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-587d8ebfc2e9> in <module>
----> 1 np.random.randint(0,500, size=100, dtype=np.int8)
mtrand.pyx in mtrand.RandomState.randint()
ValueError: high is out of bounds for int8
Same for bool
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-8c6e78ec7bf7> in <module>
----> 1 np.random.randint(0,3, size=100, dtype=np.bool)
mtrand.pyx in mtrand.RandomState.randint()
ValueError: high is out of bounds for bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this discussion belongs in #16944. #16944 is about valid arguments to random_, that are not working currently, and the reasons for them not working are clear.
Things we need to decide (and possibly change behavior) are what happens when random_ is called with no arguments - right now behavior is inconsistent. The docs say the values are usually only bounded by self tensor’s data type but try generating negative numbers for int tensor. My suggestion here is to use full range, but #16944 will have to be fixed for this. I'm also not against documenting and preserving current behavior. Also, what happens when random_ arguments fall outside numeric range of source tensor - my suggestion is to follow numpy and throw an error.
| ", ", std::numeric_limits<scalar_t>::max(), "]", | ||
| " but found [from, to] to be [", from, ", ", to, "]"); | ||
| }); | ||
| auto range = static_cast<uint64_t>(to) - static_cast<uint64_t>(from); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional static casts here look kind of questionable. Are you assuming that to and from are non-negative? I don't see any check to this effect. If to/from could be negative, casting them to unsigned is surprising (it might work out in the end, but I don't want to have to work that out in my head without a comment.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code did max_val - min_val, without the casts. Now, the promotion rules make my head hurt, but what I think happens is you do the subtraction on int64_t, and then convert the (now known to be positive) int64_t into a uint64_t. So it seems the old code did something different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I goofed here may be in an attempt to fix this bug: #16944. Reverted to what it was before for now. I'll fix that bug separately in a different PR when I come to a conclusion about this: #16944 (comment).
| auto gen = check_generator<CUDAGenerator>(gen_, &globalContext().defaultGenerator(kCUDA)); | ||
| AT_DISPATCH_ALL_TYPES_AND(at::ScalarType::Half, iter.dtype(), "random_cuda", [&] { | ||
| using accscalar_t = at::acc_type<scalar_t, true>; | ||
| if (std::is_same<scalar_t, double>::value || std::is_same<scalar_t, int64_t>::value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Consider using template specializations to do this in the future. You're generating dead code when you do it this way; fortunately that code type checks and will get DCE'd later... but best not to generate it at all in the first place.
| if (std::is_same<scalar_t, double>::value || std::is_same<scalar_t, int64_t>::value) { | ||
| // define lambda to mod with range and add base | ||
| auto random_func = [range, base] __device__ (uint64_t rand) { | ||
| return static_cast<scalar_t>(rand % range + base); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh heh, modulo isn't uniform, but we're unlikely to switch this to rejection sampling. No changes needed, but it did look funny to see :>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah :). Will leave that discussion for another day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 orthogonal problems with modulo
- when
rangeis comparable to the range produced by generator, modulo indeed produces non-uniform results. Usually not a problem though, because in typical usecasesrangeis much less than range of random numbers produced - modulo essentially extracts lower bits, and lower bits have low entropy. That's true for lcg's, but we are using philox here, so should be fine.
|
Fix the errors. My plan is to review the other diffs in the stack when we get this one green; let me know if you want review earlier. |
ezyang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI fails
Move THCTensor_{random, clampedRandom, cappedRandom} to ATen
gh-metadata: pytorch pytorch 20620 gh/syed-ahmed/1/head
Move THCTensor_{random, clampedRandom, cappedRandom} to ATen
gh-metadata: pytorch pytorch 20620 gh/syed-ahmed/1/head
Move THCTensor_{random, clampedRandom, cappedRandom} to ATen
gh-metadata: pytorch pytorch 20620 gh/syed-ahmed/1/head
Move THCTensor_{random, clampedRandom, cappedRandom} to ATen
gh-metadata: pytorch pytorch 20620 gh/syed-ahmed/1/head
Move THCTensor_{random, clampedRandom, cappedRandom} to ATen
gh-metadata: pytorch pytorch 20620 gh/syed-ahmed/1/head
Move THCTensor_{random, clampedRandom, cappedRandom} to ATen
gh-metadata: pytorch pytorch 20620 gh/syed-ahmed/1/head
ezyang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@ezyang @iotamudelta Any idea why these are failing for rocm? |
|
Looks like the rocm failure is not related to this PR (although looked like it was related because var_mean, std_mean): Similar failure in #20801 (comment) as well |
Summary: Pull Request resolved: pytorch/pytorch#20620 ghimport-source-id: 7c09c2462021e3fa5adef61570a575964ff16125 Differential Revision: D15454050 Pulled By: ezyang fbshipit-source-id: 5b0421c56445baf19dbdbdd9680af128a5cdf443
Stack from ghstack:
Differential Revision: D15454050
Effective Bandwidth Benchmark
Float Type
Before:
After:
Double Type
Before:
After: