Performance (speed) regression from 0.4.0 to 0.4.1

## Issue description

I have a fairly optimized cnn-blstm-crf tagger [here](https://github.com/dpressel/baseline/blob/master/python/baseline/pytorch/tagger/model.py) with the crf defined [here](https://github.com/dpressel/baseline/blob/f0204432076c166ce3d6672705826f33aec95dbe/python/baseline/pytorch/torchy.py#L669).

On pytorch `0.4.0` using cuda `9.0` and cudnn `7102` I can run a single epoch of the conll 2003 NER task in 21.41 +/- 0.28

When the only thing I change is the version of pytorch to `0.4.1` (the current conda install) a single epoch now takes  27.99 +/-  0.26

These models are run on gpu.

From the pytorch forums I was told to post here https://discuss.pytorch.org/t/large-preformance-regression-on-0-4-1/25037

## Code example

Please try to provide a minimal example to repro the bug.

Assuming pytorch is installed

```
git clone https://github.com/blester125/baseline.git
cd baseline
git checkout speed-test
cd python
./install_dev.sh baseline no-test
echo "{}" > mead/config/mead-settings.json
mead-train --config config/conll-bio.json
```

## System Info

### 0.4.0

Collecting environment information...
PyTorch version: 0.4.0
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.3 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration: GPU 0: GeForce GTX 1070 with Max-Q Design
Nvidia driver version: 384.130
cuDNN version: Probably one of the following:
/usr/local/cuda-9.0-cudnn-7/lib64/libcudnn.so
/usr/local/cuda-9.0-cudnn-7/lib64/libcudnn.so.7
/usr/local/cuda-9.0-cudnn-7/lib64/libcudnn.so.7.0.5
/usr/local/cuda-9.0-cudnn-7/lib64/libcudnn.so.7.1.2
/usr/local/cuda-9.0-cudnn-7/lib64/libcudnn_static.a
/usr/local/cuda-9.0/lib64/libcudnn.so
/usr/local/cuda-9.0/lib64/libcudnn.so.7
/usr/local/cuda-9.0/lib64/libcudnn.so.7.0.5
/usr/local/cuda-9.0/lib64/libcudnn.so.7.1.2
/usr/local/cuda-9.0/lib64/libcudnn.so.7.2.1
/usr/local/cuda-9.0/lib64/libcudnn_static.a

Versions of relevant libraries:
[pip] numpy (1.14.3)
[pip] torch (0.4.0)
[pip] torchfile (0.1.0)
[pip] torchvision (0.2.1)
[conda] cuda90                    1.0                  h6433d27_0    pytorch
[conda] pytorch                   0.4.0            py36hdf912b8_0  
[conda] torchfile                 0.1.0                     <pip>
[conda] torchvision               0.2.1                    py36_1    pytorch


### 0.4.1
Collecting environment information...
PyTorch version: 0.4.1.post2
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.3 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration: GPU 0: GeForce GTX 1070 with Max-Q Design
Nvidia driver version: 384.130
cuDNN version: Probably one of the following:
/usr/local/cuda-9.0-cudnn-7/lib64/libcudnn.so
/usr/local/cuda-9.0-cudnn-7/lib64/libcudnn.so.7
/usr/local/cuda-9.0-cudnn-7/lib64/libcudnn.so.7.0.5
/usr/local/cuda-9.0-cudnn-7/lib64/libcudnn.so.7.1.2
/usr/local/cuda-9.0-cudnn-7/lib64/libcudnn_static.a
/usr/local/cuda-9.0/lib64/libcudnn.so
/usr/local/cuda-9.0/lib64/libcudnn.so.7
/usr/local/cuda-9.0/lib64/libcudnn.so.7.0.5
/usr/local/cuda-9.0/lib64/libcudnn.so.7.1.2
/usr/local/cuda-9.0/lib64/libcudnn.so.7.2.1
/usr/local/cuda-9.0/lib64/libcudnn_static.a

Versions of relevant libraries:
[pip] numpy (1.14.3)
[pip] torch (0.4.1.post2)
[pip] torchfile (0.1.0)
[pip] torchvision (0.2.1)
[conda] cuda90                    1.0                  h6433d27_0    pytorch
[conda] pytorch                   0.4.1           py36_py35_py27__9.0.176_7.1.2_2    pytorch
[conda] torchfile                 0.1.0                     <pip>
[conda] torchvision               0.2.1                    py36_1    pytorch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance (speed) regression from 0.4.0 to 0.4.1 #11647

Issue description

Code example

System Info

0.4.0

0.4.1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance (speed) regression from 0.4.0 to 0.4.1 #11647

Description

Issue description

Code example

System Info

0.4.0

0.4.1

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions