Memory error for batched inverse

## 🐛 Bug

There seems to be a limit of N=256x256 for the number of batches one can invert on the gpu. I don't think this limit corresponds properly to available memory. Because I can run a N=256x256 -1 batched inverse on a gpu with less availiable memory. I have tested tensorflow on google collab and it allows batched inverse of size N=6144x6144.

## To Reproduce

```
import torch
N = 256
x = torch.randn(N*N , 2, 2).cuda()
y = torch.inverse(x)
print(y.cpu().numpy())
```

Error:

> ----> 1 y.cpu().numpy()
> 
> RuntimeError: cuda runtime error (9) : invalid configuration argument at /opt/conda/conda-bld/pytorch-nightly_1540802486426/work/aten/src/THC/THCTensorCopy.cu:205

## Expected behavior
It works if I reduce the batch dimension by one.
`x = torch.randn(N*N -1 , 2, 2).cuda()
`
## Environment

> PyTorch version: 1.0.0.dev20181029
> Is debug build: No
> CUDA used to build PyTorch: 8.0.61
> 
> OS: Ubuntu 16.04.4 LTS
> GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609
> CMake version: version 3.11.0
> 
> Python version: 3.6
> Is CUDA available: Yes
> CUDA runtime version: 9.1.85
> GPU models and configuration: 
> GPU 0: GeForce GTX 1080
> GPU 1: GeForce GTX 1080
> 
> Nvidia driver version: 387.26
> cuDNN version: Probably one of the following:
> /usr/local/MATLAB/R2016b/bin/glnxa64/libcudnn.so.4.0.7
> /usr/local/cuda-9.1/lib64/libcudnn.so
> /usr/local/cuda-9.1/lib64/libcudnn.so.7
> /usr/local/cuda-9.1/lib64/libcudnn.so.7.0.5
> /usr/local/cuda-9.1/lib64/libcudnn.so.7.1.1
> /usr/local/cuda-9.1/lib64/libcudnn_static.a
> 
> Versions of relevant libraries:
> [pip] Could not collect
> [conda] cuda80                    1.0                           0    soumith
> [conda] cuda91                    1.0                  h4c16780_0    pytorch
> [conda] guided-filter-pytorch     1.1.1                     <pip>
> [conda] magma-cuda91              2.3.0                         1    pytorch
> [conda] pytorch-nightly           1.0.0.dev20181029 py3.6_cuda8.0.61_cudnn7.1.2_0  [cuda80]  pytorch
> [conda] torch                     0.4.0                     <pip>
> [conda] torch                     0.3.1                     <pip>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory error for batched inverse #13276

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory error for batched inverse #13276

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions