Skip to content

Conversation

@syed-ahmed
Copy link
Collaborator

@syed-ahmed syed-ahmed commented May 17, 2019

Stack from ghstack:

Differential Revision: D15454047

Effective Bandwidth Benchmark

Float Type

Before:

exponential, size, elements 65536 forward 4.951953887939453e-06 bandwidth (GB/s) 52.937488097063074
exponential, size, elements 131072 forward 5.1164627075195315e-06 bandwidth (GB/s) 102.47079476011184
exponential, size, elements 262144 forward 7.412433624267578e-06 bandwidth (GB/s) 141.46177263119975
exponential, size, elements 524288 forward 1.1911392211914063e-05 bandwidth (GB/s) 176.06271061265014
exponential, size, elements 1048576 forward 2.077341079711914e-05 bandwidth (GB/s) 201.90733437869852
exponential, size, elements 2097152 forward 3.968000411987305e-05 bandwidth (GB/s) 211.4064296631136
exponential, size, elements 4194304 forward 7.367134094238281e-05 bandwidth (GB/s) 227.73056368176054
exponential, size, elements 8388608 forward 0.0001375126838684082 bandwidth (GB/s) 244.00972372926472
exponential, size, elements 16777216 forward 0.0002710747718811035 bandwidth (GB/s) 247.5658783526883
exponential, size, elements 33554432 forward 0.0005277013778686524 bandwidth (GB/s) 254.34409237682056

After:

exponential, size, elements 65536 forward 5.60760498046875e-06 bandwidth (GB/s) 46.74794335782313
exponential, size, elements 131072 forward 7.700920104980468e-06 bandwidth (GB/s) 68.0812153421672
exponential, size, elements 262144 forward 6.551742553710938e-06 bandwidth (GB/s) 160.04536066608443
exponential, size, elements 524288 forward 6.9427490234375e-06 bandwidth (GB/s) 302.0636340043956
exponential, size, elements 1048576 forward 9.472370147705077e-06 bandwidth (GB/s) 442.7935072845709
exponential, size, elements 2097152 forward 1.4712810516357422e-05 bandwidth (GB/s) 570.1567345459731
exponential, size, elements 4194304 forward 2.4566650390625e-05 bandwidth (GB/s) 682.9264768795031
exponential, size, elements 8388608 forward 4.60505485534668e-05 bandwidth (GB/s) 728.6434810009216
exponential, size, elements 16777216 forward 9.00864601135254e-05 bandwidth (GB/s) 744.9384060094111
exponential, size, elements 33554432 forward 0.00017408370971679687 bandwidth (GB/s) 770.9953344764326

Double Type

Before:

exponential, size, elements 65536 forward 4.985332489013672e-06 bandwidth (GB/s) 52.58305250004783
exponential, size, elements 131072 forward 6.051063537597656e-06 bandwidth (GB/s) 86.64394229913319
exponential, size, elements 262144 forward 9.377002716064453e-06 bandwidth (GB/s) 111.82421843640988
exponential, size, elements 524288 forward 1.549959182739258e-05 bandwidth (GB/s) 135.30369208134132
exponential, size, elements 1048576 forward 2.866983413696289e-05 bandwidth (GB/s) 146.2967654421289
exponential, size, elements 2097152 forward 5.302190780639648e-05 bandwidth (GB/s) 158.2102256793561
exponential, size, elements 4194304 forward 9.615898132324219e-05 bandwidth (GB/s) 174.47372849762968
exponential, size, elements 8388608 forward 0.00018301725387573242 bandwidth (GB/s) 183.34026595537955
exponential, size, elements 16777216 forward 0.0003589057922363281 bandwidth (GB/s) 186.98183604629858
exponential, size, elements 33554432 forward 0.000672616958618164 bandwidth (GB/s) 199.5455604862227

After:

exponential, size, elements 65536 forward 5.755424499511719e-06 bandwidth (GB/s) 45.547291954266775
exponential, size, elements 131072 forward 6.275177001953125e-06 bandwidth (GB/s) 83.54951578844985
exponential, size, elements 262144 forward 7.97271728515625e-06 bandwidth (GB/s) 131.52052963827754
exponential, size, elements 524288 forward 1.2047290802001953e-05 bandwidth (GB/s) 174.07664797561844
exponential, size, elements 1048576 forward 2.0439624786376954e-05 bandwidth (GB/s) 205.20454968407793
exponential, size, elements 2097152 forward 3.5920143127441405e-05 bandwidth (GB/s) 233.53492691379267
exponential, size, elements 4194304 forward 6.896495819091797e-05 bandwidth (GB/s) 243.27160401598564
exponential, size, elements 8388608 forward 0.00012843608856201173 bandwidth (GB/s) 261.2539230653945
exponential, size, elements 16777216 forward 0.0002438235282897949 bandwidth (GB/s) 275.23539041005995
exponential, size, elements 33554432 forward 0.00046614646911621096 bandwidth (GB/s) 287.93037573462635

@pytorchbot pytorchbot added module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: operators labels May 17, 2019
Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 20623 gh/syed-ahmed/4/head
Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 20623 gh/syed-ahmed/4/head
Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 20623 gh/syed-ahmed/4/head
Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 20623 gh/syed-ahmed/4/head
Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 20623 gh/syed-ahmed/4/head
Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 20623 gh/syed-ahmed/4/head
Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 20623 gh/syed-ahmed/4/head
Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 20623 gh/syed-ahmed/4/head
Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 20623 gh/syed-ahmed/4/head
@syed-ahmed
Copy link
Collaborator Author

@ezyang If you are importing internally, can you please trigger again :). I added this bit of code:

if(rand == static_cast<accscalar_t>(1.0)) {
  sample = ::log(std::nextafter(1.0, 0.0));
} else {
  sample = ::log(rand);
}

to handle #20179

@syed-ahmed syed-ahmed requested a review from ezyang May 24, 2019 18:50
Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 20623 gh/syed-ahmed/4/head
Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 20623 gh/syed-ahmed/4/head
Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 20623 gh/syed-ahmed/4/head
Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 20623 gh/syed-ahmed/4/head
Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 20623 gh/syed-ahmed/4/head
Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 20623 gh/syed-ahmed/4/head
@ezyang
Copy link
Contributor

ezyang commented May 31, 2019

ROCm failed on every diff after this one in the stack, so I suspect this is a real failure.

@ezyang
Copy link
Contributor

ezyang commented May 31, 2019


18:28:35 ======================================================================
18:28:35 ERROR: test_AdaptiveAvgPool1d_cuda (__main__.TestNN)
18:28:35 ----------------------------------------------------------------------
18:28:35 Traceback (most recent call last):
18:28:35   File "/var/lib/jenkins/workspace/test/common_utils.py", line 357, in wrapper
18:28:35     with self.assertLeaksNoCudaTensors():
18:28:35   File "/var/lib/jenkins/workspace/test/common_utils.py", line 346, in assertLeaksNoCudaTensors
18:28:35     return CudaMemoryLeakCheck(self, name)
18:28:35   File "/var/lib/jenkins/workspace/test/common_utils.py", line 294, in __init__
18:28:35     initialize_cuda_context_rng()
18:28:35   File "/var/lib/jenkins/workspace/test/common_cuda.py", line 33, in initialize_cuda_context_rng
18:28:35     torch.randn(1, device="cuda:{}".format(i))
18:28:35 RuntimeError: std::exception
18:28:35 

cc @iotamudelta @bddppq

Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 20623 gh/syed-ahmed/4/head
Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 20623 gh/syed-ahmed/4/head
Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 20623 gh/syed-ahmed/4/head
Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 20623 gh/syed-ahmed/4/head
@syed-ahmed syed-ahmed removed module: onnx Related to torch.onnx module: third_party labels May 31, 2019
@syed-ahmed syed-ahmed closed this Jun 3, 2019
@syed-ahmed syed-ahmed deleted the gh/syed-ahmed/4/head branch June 3, 2019 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants