Skip to content

Conversation

@syed-ahmed
Copy link
Collaborator

@syed-ahmed syed-ahmed commented Jun 3, 2019

Stack from ghstack:

Resubmit of #20623

Effective Bandwidth Benchmark

Float Type

Before:

exponential, size, elements 65536 forward 4.951953887939453e-06 bandwidth (GB/s) 52.937488097063074
exponential, size, elements 131072 forward 5.1164627075195315e-06 bandwidth (GB/s) 102.47079476011184
exponential, size, elements 262144 forward 7.412433624267578e-06 bandwidth (GB/s) 141.46177263119975
exponential, size, elements 524288 forward 1.1911392211914063e-05 bandwidth (GB/s) 176.06271061265014
exponential, size, elements 1048576 forward 2.077341079711914e-05 bandwidth (GB/s) 201.90733437869852
exponential, size, elements 2097152 forward 3.968000411987305e-05 bandwidth (GB/s) 211.4064296631136
exponential, size, elements 4194304 forward 7.367134094238281e-05 bandwidth (GB/s) 227.73056368176054
exponential, size, elements 8388608 forward 0.0001375126838684082 bandwidth (GB/s) 244.00972372926472
exponential, size, elements 16777216 forward 0.0002710747718811035 bandwidth (GB/s) 247.5658783526883
exponential, size, elements 33554432 forward 0.0005277013778686524 bandwidth (GB/s) 254.34409237682056

After:

exponential, size, elements 65536 forward 5.60760498046875e-06 bandwidth (GB/s) 46.74794335782313
exponential, size, elements 131072 forward 7.700920104980468e-06 bandwidth (GB/s) 68.0812153421672
exponential, size, elements 262144 forward 6.551742553710938e-06 bandwidth (GB/s) 160.04536066608443
exponential, size, elements 524288 forward 6.9427490234375e-06 bandwidth (GB/s) 302.0636340043956
exponential, size, elements 1048576 forward 9.472370147705077e-06 bandwidth (GB/s) 442.7935072845709
exponential, size, elements 2097152 forward 1.4712810516357422e-05 bandwidth (GB/s) 570.1567345459731
exponential, size, elements 4194304 forward 2.4566650390625e-05 bandwidth (GB/s) 682.9264768795031
exponential, size, elements 8388608 forward 4.60505485534668e-05 bandwidth (GB/s) 728.6434810009216
exponential, size, elements 16777216 forward 9.00864601135254e-05 bandwidth (GB/s) 744.9384060094111
exponential, size, elements 33554432 forward 0.00017408370971679687 bandwidth (GB/s) 770.9953344764326

Double Type

Before:

exponential, size, elements 65536 forward 4.985332489013672e-06 bandwidth (GB/s) 52.58305250004783
exponential, size, elements 131072 forward 6.051063537597656e-06 bandwidth (GB/s) 86.64394229913319
exponential, size, elements 262144 forward 9.377002716064453e-06 bandwidth (GB/s) 111.82421843640988
exponential, size, elements 524288 forward 1.549959182739258e-05 bandwidth (GB/s) 135.30369208134132
exponential, size, elements 1048576 forward 2.866983413696289e-05 bandwidth (GB/s) 146.2967654421289
exponential, size, elements 2097152 forward 5.302190780639648e-05 bandwidth (GB/s) 158.2102256793561
exponential, size, elements 4194304 forward 9.615898132324219e-05 bandwidth (GB/s) 174.47372849762968
exponential, size, elements 8388608 forward 0.00018301725387573242 bandwidth (GB/s) 183.34026595537955
exponential, size, elements 16777216 forward 0.0003589057922363281 bandwidth (GB/s) 186.98183604629858
exponential, size, elements 33554432 forward 0.000672616958618164 bandwidth (GB/s) 199.5455604862227

After:

exponential, size, elements 65536 forward 5.755424499511719e-06 bandwidth (GB/s) 45.547291954266775
exponential, size, elements 131072 forward 6.275177001953125e-06 bandwidth (GB/s) 83.54951578844985
exponential, size, elements 262144 forward 7.97271728515625e-06 bandwidth (GB/s) 131.52052963827754
exponential, size, elements 524288 forward 1.2047290802001953e-05 bandwidth (GB/s) 174.07664797561844
exponential, size, elements 1048576 forward 2.0439624786376954e-05 bandwidth (GB/s) 205.20454968407793
exponential, size, elements 2097152 forward 3.5920143127441405e-05 bandwidth (GB/s) 233.53492691379267
exponential, size, elements 4194304 forward 6.896495819091797e-05 bandwidth (GB/s) 243.27160401598564
exponential, size, elements 8388608 forward 0.00012843608856201173 bandwidth (GB/s) 261.2539230653945
exponential, size, elements 16777216 forward 0.0002438235282897949 bandwidth (GB/s) 275.23539041005995
exponential, size, elements 33554432 forward 0.00046614646911621096 bandwidth (GB/s) 287.93037573462635

Differential Revision: D15632931

@pytorchbot pytorchbot added module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: operators labels Jun 3, 2019
Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 21297 gh/syed-ahmed/9/head
@syed-ahmed
Copy link
Collaborator Author

syed-ahmed commented Jun 4, 2019

@ezyang I think the std::nextafter in device code was causing the rocm failure. Let's see if tests pass now. (I'm puzzled though why it would cause a std::exception in a torch.randn call in the exponential PR! @.@).

CC: @iotamudelta @bddppq

@syed-ahmed syed-ahmed requested a review from ezyang June 4, 2019 05:53
@syed-ahmed
Copy link
Collaborator Author

Confirmed it was std::nextafter in device code that was causing rocm failure.

@jerryzh168 jerryzh168 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 4, 2019
@ezyang
Copy link
Contributor

ezyang commented Jun 4, 2019

Nice catch. Maybe we can add a note about this to CONTRIBUTING.md

Move THCTensor_(exponential) to ATen

gh-metadata: pytorch pytorch 21297 gh/syed-ahmed/9/head
@syed-ahmed
Copy link
Collaborator Author

Added the std::nextafter note in #21386

@zou3519 zou3519 deleted the gh/syed-ahmed/9/head branch June 5, 2019 02:16
@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in d341bcb.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jun 5, 2019
Summary:
Pull Request resolved: pytorch/pytorch#21297
ghimport-source-id: 5f45154e714ab44dec961dabf1c64e54aaa063a2

Reviewed By: jerryzh168

Differential Revision: D15632931

Pulled By: ezyang

fbshipit-source-id: 0367eec0a9ef6812b1b3ab7597817ee40a011bb8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants