[CUDNN] PoolWindow::reserve crash, vector out of range. Race condition

## 🐛 Bug

When using a JIT model twice at the same time, for the first time, I get a crash here.

PoolWindow::reserve seems to try to access a vector out of range.

The responsible code was added in https://github.com/pytorch/pytorch/pull/14861

Backtrace:
```
 	msvcp140d.dll!std::_Debug_message(const wchar_t * message, const wchar_t * file, unsigned int line) Line 9	C++
>	caffe2_gpu.dll!std::vector<std::_List_unchecked_iterator<std::_List_val<std::_List_simple_types<std::pair<int const ,cudnnContext * __ptr64> > > >,std::allocator<std::_List_unchecked_iterator<std::_List_val<std::_List_simple_types<std::pair<int const ,cudnnContext * __ptr64> > > > > >::operator[](const unsigned __int64 _Pos) Line 1796	C++
 	caffe2_gpu.dll!std::_Hash<std::_Umap_traits<int,cudnnContext * __ptr64,std::_Uhash_compare<int,std::hash<int>,std::equal_to<int> >,std::allocator<std::pair<int const ,cudnnContext * __ptr64> >,0> >::_Vec_lo(unsigned __int64 _Bucket) Line 822	C++
 	caffe2_gpu.dll!std::_Hash<std::_Umap_traits<int,cudnnContext * __ptr64,std::_Uhash_compare<int,std::hash<int>,std::equal_to<int> >,std::allocator<std::pair<int const ,cudnnContext * __ptr64> >,0> >::_Begin(unsigned __int64 _Bucket) Line 841	C++
 	caffe2_gpu.dll!std::_Hash<std::_Umap_traits<int,cudnnContext * __ptr64,std::_Uhash_compare<int,std::hash<int>,std::equal_to<int> >,std::allocator<std::pair<int const ,cudnnContext * __ptr64> >,0> >::lower_bound(const int & _Keyval) Line 647	C++
 	caffe2_gpu.dll!std::_Hash<std::_Umap_traits<int,cudnnContext * __ptr64,std::_Uhash_compare<int,std::hash<int>,std::equal_to<int> >,std::allocator<std::pair<int const ,cudnnContext * __ptr64> >,0> >::find(const int & _Keyval) Line 630	C++
 	caffe2_gpu.dll!at::native::`anonymous namespace'::PoolWindow::reserve(int device) Line 88	C++
 	caffe2_gpu.dll!at::native::getCudnnHandle() Line 149	C++
 	caffe2_gpu.dll!at::native::setCuDNNStreamToCurrent() Line 13	C++
 	caffe2_gpu.dll!at::native::cudnn_convolution(const at::Tensor & input_t, const at::Tensor & weight_t, const at::Tensor & bias_t, c10::ArrayRef<__int64> padding, c10::ArrayRef<__int64> stride, c10::ArrayRef<__int64> dilation, __int64 groups, bool benchmark, bool deterministic) Line 930	C++
 	caffe2_gpu.dll!at::CUDAFloatType::cudnn_convolution(const at::Tensor & self, const at::Tensor & weight, const at::Tensor & bias, c10::ArrayRef<__int64> padding, c10::ArrayRef<__int64> stride, c10::ArrayRef<__int64> dilation, __int64 groups, bool benchmark, bool deterministic) Line 5315	C++
```

## To Reproduce

Steps to reproduce the behavior:

1. Load a Jit model (once, so on one thread)
2. After the model is loaded, forward it twice simultaneously (separate threads)
3. Experience crash in <vector> header

Called from two threads at same time:
```
    static std::once_flag model_flag;
    std::call_once(model_flag, [&modelFile]() {
        model = torch::jit::load(modelFile, torch::kCUDA); }
    );
    // All works fine up until here
    model->forward({torch::randn({1, 3, 1024, 1024}, torch::kCUDA)});
    // Crashes when called from a different thread than model was created on.
```

## Expected behavior

Works when loaded from one thread without having to wait for some random time period to prevent race condition.

## Environment

 - PyTorch Version (e.g., 1.0): 1.1.0-pre
 - OS (e.g., Linux): Windows 64-bit
 - How you installed PyTorch (`conda`, `pip`, source): nightly
 - Build command you used (if compiling from source): N/A
 - Python version: 3.7
 - CUDA/cuDNN version: 10.0/7.5
 - GPU models and configuration: GTX1060

## Workarounds
Other models that I have in a special model thread that batches the inputs will work fine as it ensures they don't get run simultaneously.
This specific model has tensors of different sizes which I am unable to batch together.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDNN] PoolWindow::reserve crash, vector out of range. Race condition #19394

🐛 Bug

To Reproduce

Expected behavior

Environment

Workarounds

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[CUDNN] PoolWindow::reserve crash, vector out of range. Race condition #19394

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Workarounds

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions