-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Labels
module: performanceIssues related to performance, either of kernel code or framework glueIssues related to performance, either of kernel code or framework gluetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
Found by @mirandaconrado
We wrote a simple benchmarking script and are comparing speed on OSX using the conda install of pytorch and the PyPI install.
Here are the results:
**Install from PyPI wheel (using pip)**
0.1.12_2
volatile = False batchsize = 1 time = 2.668 samples/sec = 3.748
volatile = True batchsize = 1 time = 2.125 samples/sec = 4.707
volatile = False batchsize = 2 time = 3.031 samples/sec = 6.598
volatile = True batchsize = 2 time = 2.222 samples/sec = 9.002
volatile = False batchsize = 4 time = 2.714 samples/sec = 14.736
volatile = True batchsize = 4 time = 2.305 samples/sec = 17.356
volatile = False batchsize = 8 time = 3.506 samples/sec = 22.821
volatile = True batchsize = 8 time = 3.012 samples/sec = 26.558
volatile = False batchsize = 16 time = 4.008 samples/sec = 39.916
volatile = True batchsize = 16 time = 3.616 samples/sec = 44.243
volatile = False batchsize = 32 time = 4.557 samples/sec = 70.220
volatile = True batchsize = 32 time = 3.822 samples/sec = 83.730
**From Conda install**
0.1.12_2
volatile = False batchsize = 1 time = 2.234 samples/sec = 4.476
volatile = True batchsize = 1 time = 1.711 samples/sec = 5.843
volatile = False batchsize = 2 time = 2.359 samples/sec = 8.479
volatile = True batchsize = 2 time = 1.939 samples/sec = 10.316
volatile = False batchsize = 4 time = 2.443 samples/sec = 16.371
volatile = True batchsize = 4 time = 2.017 samples/sec = 19.831
volatile = False batchsize = 8 time = 2.444 samples/sec = 32.730
volatile = True batchsize = 8 time = 2.172 samples/sec = 36.828
volatile = False batchsize = 16 time = 2.773 samples/sec = 57.708
volatile = True batchsize = 16 time = 2.351 samples/sec = 68.052
volatile = False batchsize = 32 time = 3.424 samples/sec = 93.453
volatile = True batchsize = 32 time = 2.996 samples/sec = 106.800
Here's the script I'm using to generate the results:
speed_comparisson_test.sh
conda create --name pytorch_speed_from_pypi -y python=2.7.13 numpy pyyaml
source activate pytorch_speed_from_pypi
wget http://download.pytorch.org/whl/torch-0.1.12.post2-cp27-none-macosx_10_7_x86_64.whl
pip uninstall -y torch
pip install torch-0.1.12.post2-cp27-none-macosx_10_7_x86_64.whl --user
python test.py
pip uninstall -y torch
source deactivate
conda-env remove --name pytorch_speed_from_pypi -y
rm torch-0.1.12.post2-cp27-none-macosx_10_7_x86_64.whl
conda create --name pytorch_speed_conda_only -y python=2.7.13 numpy pyyaml
source activate pytorch_speed_conda_only
conda install pytorch -y -c soumith
python test.py
source deactivate
conda-env remove -y --name pytorch_speed_conda_onlyAnd here's a dump of test.py.
It's also worth noting that when speed_comparisson_test.sh is run with OMP_NUM_THREADS=1 I don't see a significant difference in speed from the conda install vs the wheel.
Metadata
Metadata
Assignees
Labels
module: performanceIssues related to performance, either of kernel code or framework glueIssues related to performance, either of kernel code or framework gluetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module