-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
🐛 DLRM performance regression
This check-in cause a little regression on the DLRM benchmark.
Without this check-in the DLRM benchmark result is like:
numactl --physcpubind=0-27 -m 0 python dlrm_s_pytorch.py --mini-batch-size=2048 --num-batches=100 --data-generation=random --arch-mlp-bot=512-512-64 --arch-mlp-top=1024-1024-1024-1 --arch-sparse-feature-size=64 --arch-embedding-size=1000000-1000000-1000000-1000000-1000000-1000000-1000000-1000000 --num-indices-per-lookup=100 --arch-interaction-op=dot --numpy-rand-seed=727 --print-freq=100 --print-time --enable-profiling > model1_CPU_PT_28.log
Min time per iteration = 3591.85
with this check-in , the result is like:
numactl --physcpubind=0-27 -m 0 python dlrm_s_pytorch.py --mini-batch-size=2048 --num-batches=100 --data-generation=random --arch-mlp-bot=512-512-64 --arch-mlp-top=1024-1024-1024-1 --arch-sparse-feature-size=64 --arch-embedding-size=1000000-1000000-1000000-1000000-1000000-1000000-1000000-1000000 --num-indices-per-lookup=100 --arch-interaction-op=dot --numpy-rand-seed=727 --print-freq=100 --print-time --enable-profiling > model1_CPU_PT_28.log
Min time per iteration = 3649.57
The profiling data show that there are some time increased in the
index_select , mm and addmm operations
| This | Before | |
|---|---|---|
index_select |
1559.31ms | 1503.48ms |
mm |
27.14ms | 14.58ms |
addmm |
22.38ms | 9.79966ms |
To Reproduce
Steps to reproduce the behavior:
- Download the DLRM from https://github.com/facebookresearch/dlrm
- Modify the bench/dlrm_s_benchmark.sh to just run pytorch on cpu version, as
build=0
cpu=1
gpu=0
pt=1
c2=0
And export two KMP variables as
export KMP_BLOCKTIME=1
export KMP_AFFINITY="granularity=fine,compact,1,0" - Run bench/dlrm_s_benchmark.sh on SKX8180 machine. performance profiling data is stored at file model1_CPU_PT_28.prof
'This' is got from commit-id: d0a4b2f
'Before' is got from commit-id: 42e7eb0
Expected behavior
The DLRM performance has no impacted, Thanks
Environment
Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).
You can get the script and run it with:
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
-
PyTorch Version (e.g., 1.0): commit-id: d0a4b2f
-
OS (e.g., Linux): Ubuntu 16.04.5 LTS
-
How you installed PyTorch (
conda,pip, source): -
Build command you used (if compiling from source): python setup.py install
-
Python version: 3.7
-
CUDA/cuDNN version: N/A
-
GPU models and configuration: N/A
-
Any other relevant information:
GCC version: (Ubuntu 8.3.0-16ubuntu3~16.04) 8.3.0
CMake version: version 3.14.4[pip3] numpy==1.16.2
[pip3] numpydoc==0.8.0
[conda] blas 1.0 mkl
[conda] mkl 2019.0 pypi_0 pypi
[conda] mkl-devel 2019.3 200
[conda] mkl-include 2019.0 pypi_0 pypi
[conda] mkl-service 1.1.2 py37he904b0f_5
[conda] mkl_fft 1.0.10 py37ha843d7b_0
[conda] mkl_random 1.0.2 py37hd81dba3_0