CUBLAS_STATUS_EXECUTION_FAILED error on torch >= 1.8.0 and CUDA 11.1

## 🐛 Bug


When trying to run fairscale unittests with torch >= 1.8.0 and cuda 11.1, I am getting many CUBLAS failures This did not happen with 1.7.1. I've also tried March 30 nightly torch 1.9.0 and see the same error.

> attn_output_weights = torch.bmm(q, k.transpose(1, 2))
E RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`
https://app.circleci.com/pipelines/github/facebookresearch/fairscale/2192/workflows/9d16d5d4-d104-4aee-971e-4eb329da80c8/jobs/14498

This does not occur with torch 1.7.1 and cuda 11.1



## To Reproduce

Steps to reproduce the behavior:

1.
1.
1.



## Expected behavior



## Environment

Please copy and paste the output from our
[environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)
(or fill out the checklist below manually).

You can get the script and run it with:
```
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
```
PyTorch version: 1.9.0.dev20210330+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 16.04.7 LTS (x86_64)
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
Clang version: Could not collect
CMake version: version 3.5.1

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Tesla M60
GPU 1: Tesla M60
GPU 2: Tesla M60
GPU 3: Tesla M60

Nvidia driver version: 455.32.00
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] torch==1.9.0.dev20210330+cu111
[pip3] torchtext==0.6.0
[pip3] torchvision==0.10.0.dev20210330+cu111
[conda] Could not collect

 - PyTorch Version (e.g., 1.0):
 - OS (e.g., Linux):
 - How you installed PyTorch (`conda`, `pip`, source):
 - Build command you used (if compiling from source):
 - Python version:
 - CUDA/cuDNN version:
 - GPU models and configuration:
 - Any other relevant information:

## Additional context




cc @ngimel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUBLAS_STATUS_EXECUTION_FAILED error on torch >= 1.8.0 and CUDA 11.1 #54975

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUBLAS_STATUS_EXECUTION_FAILED error on torch >= 1.8.0 and CUDA 11.1 #54975

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions