Skip to content

Conversation

@ezyang
Copy link
Contributor

@ezyang ezyang commented Jun 3, 2019

Effective Bandwidth Benchmark

Float Type

Before:

normal, size, elements 65536 forward 4.956722259521484e-06 bandwidth (GB/s) 52.88656218258779
normal, size, elements 131072 forward 5.285739898681641e-06 bandwidth (GB/s) 99.18914098114568
normal, size, elements 262144 forward 7.548332214355469e-06 bandwidth (GB/s) 138.91492454529376
normal, size, elements 524288 forward 1.1980533599853516e-05 bandwidth (GB/s) 175.0466273076219
normal, size, elements 1048576 forward 2.091646194458008e-05 bandwidth (GB/s) 200.52645667862762
normal, size, elements 2097152 forward 3.9961338043212894e-05 bandwidth (GB/s) 209.91809610901498
normal, size, elements 4194304 forward 7.39765167236328e-05 bandwidth (GB/s) 226.79110538115253
normal, size, elements 8388608 forward 0.0001377725601196289 bandwidth (GB/s) 243.5494555001696
normal, size, elements 16777216 forward 0.0002710080146789551 bandwidth (GB/s) 247.62686107087774
normal, size, elements 33554432 forward 0.0005375170707702637 bandwidth (GB/s) 249.69947058177252

After:

normal, size, elements 65536 forward 6.198883056640625e-06 bandwidth (GB/s) 42.288908760615385
normal, size, elements 131072 forward 6.756782531738281e-06 bandwidth (GB/s) 77.59432800112916
normal, size, elements 262144 forward 7.560253143310547e-06 bandwidth (GB/s) 138.6958849291706
normal, size, elements 524288 forward 7.550716400146485e-06 bandwidth (GB/s) 277.7421225831386
normal, size, elements 1048576 forward 1.1034011840820313e-05 bandwidth (GB/s) 380.1250225673293
normal, size, elements 2097152 forward 1.802682876586914e-05 bandwidth (GB/s) 465.34019427102237
normal, size, elements 4194304 forward 2.8417110443115234e-05 bandwidth (GB/s) 590.3913430460946
normal, size, elements 8388608 forward 4.8711299896240235e-05 bandwidth (GB/s) 688.8428777608927
normal, size, elements 16777216 forward 9.685993194580078e-05 bandwidth (GB/s) 692.8444265018856
normal, size, elements 33554432 forward 0.00018213510513305663 bandwidth (GB/s) 736.9130069787966

Double Type

Before:

normal, size, elements 65536 forward 5.8841705322265624e-06 bandwidth (GB/s) 44.55071425348461
normal, size, elements 131072 forward 8.018016815185547e-06 bandwidth (GB/s) 65.38873789925661
normal, size, elements 262144 forward 1.2989044189453124e-05 bandwidth (GB/s) 80.72772597474304
normal, size, elements 524288 forward 2.2075176239013673e-05 bandwidth (GB/s) 95.00046465285668
normal, size, elements 1048576 forward 4.1041374206542965e-05 bandwidth (GB/s) 102.19696784254678
normal, size, elements 2097152 forward 7.57598876953125e-05 bandwidth (GB/s) 110.72624650312186
normal, size, elements 4194304 forward 0.00013725996017456056 bandwidth (GB/s) 122.22949779865557
normal, size, elements 8388608 forward 0.0002614736557006836 bandwidth (GB/s) 128.32815569921402
normal, size, elements 16777216 forward 0.0005080199241638184 bandwidth (GB/s) 132.0988819689674
normal, size, elements 33554432 forward 0.0009479570388793945 bandwidth (GB/s) 141.58629821311564

After:

normal, size, elements 65536 forward 5.991458892822265e-06 bandwidth (GB/s) 43.75294977222444
normal, size, elements 131072 forward 7.293224334716797e-06 bandwidth (GB/s) 71.88699756626349
normal, size, elements 262144 forward 8.094310760498048e-06 bandwidth (GB/s) 129.54481623281296
normal, size, elements 524288 forward 1.2805461883544922e-05 bandwidth (GB/s) 163.7701177100726
normal, size, elements 1048576 forward 2.2592544555664064e-05 bandwidth (GB/s) 185.64991604491345
normal, size, elements 2097152 forward 3.801822662353516e-05 bandwidth (GB/s) 220.6470092112881
normal, size, elements 4194304 forward 6.761550903320313e-05 bandwidth (GB/s) 248.1267425164457
normal, size, elements 8388608 forward 0.00013209104537963867 bandwidth (GB/s) 254.02503177684966
normal, size, elements 16777216 forward 0.0002667689323425293 bandwidth (GB/s) 251.56176699703818
normal, size, elements 33554432 forward 0.0004705166816711426 bandwidth (GB/s) 285.25604559501795

Resubmit of #20621

@pytorchbot pytorchbot added module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: nn Related to torch.nn module: operators labels Jun 3, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@syed-ahmed syed-ahmed deleted the gh/syed-ahmed/2/head branch June 3, 2019 16:47
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jun 3, 2019
…ddevs} to ATen (#21287)

Summary:
## Effective Bandwidth Benchmark
- using https://gist.github.com/syed-ahmed/f8b7384d642f4bce484228b508b4bc68
- on V100
### Float Type
#### Before:
```
normal, size, elements 65536 forward 4.956722259521484e-06 bandwidth (GB/s) 52.88656218258779
normal, size, elements 131072 forward 5.285739898681641e-06 bandwidth (GB/s) 99.18914098114568
normal, size, elements 262144 forward 7.548332214355469e-06 bandwidth (GB/s) 138.91492454529376
normal, size, elements 524288 forward 1.1980533599853516e-05 bandwidth (GB/s) 175.0466273076219
normal, size, elements 1048576 forward 2.091646194458008e-05 bandwidth (GB/s) 200.52645667862762
normal, size, elements 2097152 forward 3.9961338043212894e-05 bandwidth (GB/s) 209.91809610901498
normal, size, elements 4194304 forward 7.39765167236328e-05 bandwidth (GB/s) 226.79110538115253
normal, size, elements 8388608 forward 0.0001377725601196289 bandwidth (GB/s) 243.5494555001696
normal, size, elements 16777216 forward 0.0002710080146789551 bandwidth (GB/s) 247.62686107087774
normal, size, elements 33554432 forward 0.0005375170707702637 bandwidth (GB/s) 249.69947058177252
```
#### After:
```
normal, size, elements 65536 forward 6.198883056640625e-06 bandwidth (GB/s) 42.288908760615385
normal, size, elements 131072 forward 6.756782531738281e-06 bandwidth (GB/s) 77.59432800112916
normal, size, elements 262144 forward 7.560253143310547e-06 bandwidth (GB/s) 138.6958849291706
normal, size, elements 524288 forward 7.550716400146485e-06 bandwidth (GB/s) 277.7421225831386
normal, size, elements 1048576 forward 1.1034011840820313e-05 bandwidth (GB/s) 380.1250225673293
normal, size, elements 2097152 forward 1.802682876586914e-05 bandwidth (GB/s) 465.34019427102237
normal, size, elements 4194304 forward 2.8417110443115234e-05 bandwidth (GB/s) 590.3913430460946
normal, size, elements 8388608 forward 4.8711299896240235e-05 bandwidth (GB/s) 688.8428777608927
normal, size, elements 16777216 forward 9.685993194580078e-05 bandwidth (GB/s) 692.8444265018856
normal, size, elements 33554432 forward 0.00018213510513305663 bandwidth (GB/s) 736.9130069787966
```
### Double Type
#### Before:
```
normal, size, elements 65536 forward 5.8841705322265624e-06 bandwidth (GB/s) 44.55071425348461
normal, size, elements 131072 forward 8.018016815185547e-06 bandwidth (GB/s) 65.38873789925661
normal, size, elements 262144 forward 1.2989044189453124e-05 bandwidth (GB/s) 80.72772597474304
normal, size, elements 524288 forward 2.2075176239013673e-05 bandwidth (GB/s) 95.00046465285668
normal, size, elements 1048576 forward 4.1041374206542965e-05 bandwidth (GB/s) 102.19696784254678
normal, size, elements 2097152 forward 7.57598876953125e-05 bandwidth (GB/s) 110.72624650312186
normal, size, elements 4194304 forward 0.00013725996017456056 bandwidth (GB/s) 122.22949779865557
normal, size, elements 8388608 forward 0.0002614736557006836 bandwidth (GB/s) 128.32815569921402
normal, size, elements 16777216 forward 0.0005080199241638184 bandwidth (GB/s) 132.0988819689674
normal, size, elements 33554432 forward 0.0009479570388793945 bandwidth (GB/s) 141.58629821311564
```
#### After:
```
normal, size, elements 65536 forward 5.991458892822265e-06 bandwidth (GB/s) 43.75294977222444
normal, size, elements 131072 forward 7.293224334716797e-06 bandwidth (GB/s) 71.88699756626349
normal, size, elements 262144 forward 8.094310760498048e-06 bandwidth (GB/s) 129.54481623281296
normal, size, elements 524288 forward 1.2805461883544922e-05 bandwidth (GB/s) 163.7701177100726
normal, size, elements 1048576 forward 2.2592544555664064e-05 bandwidth (GB/s) 185.64991604491345
normal, size, elements 2097152 forward 3.801822662353516e-05 bandwidth (GB/s) 220.6470092112881
normal, size, elements 4194304 forward 6.761550903320313e-05 bandwidth (GB/s) 248.1267425164457
normal, size, elements 8388608 forward 0.00013209104537963867 bandwidth (GB/s) 254.02503177684966
normal, size, elements 16777216 forward 0.0002667689323425293 bandwidth (GB/s) 251.56176699703818
normal, size, elements 33554432 forward 0.0004705166816711426 bandwidth (GB/s) 285.25604559501795
```

Resubmit of #20621
Pull Request resolved: pytorch/pytorch#21287

Differential Revision: D15603695

Pulled By: ezyang

fbshipit-source-id: f8c5032678d503d45ac99fb1475a929df7c2b361
@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in 155f767.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: nn Related to torch.nn

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants