-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
🐛 Bug
There seems to be a limit of N=256x256 for the number of batches one can invert on the gpu. I don't think this limit corresponds properly to available memory. Because I can run a N=256x256 -1 batched inverse on a gpu with less availiable memory. I have tested tensorflow on google collab and it allows batched inverse of size N=6144x6144.
To Reproduce
import torch
N = 256
x = torch.randn(N*N , 2, 2).cuda()
y = torch.inverse(x)
print(y.cpu().numpy())
Error:
----> 1 y.cpu().numpy()
RuntimeError: cuda runtime error (9) : invalid configuration argument at /opt/conda/conda-bld/pytorch-nightly_1540802486426/work/aten/src/THC/THCTensorCopy.cu:205
Expected behavior
It works if I reduce the batch dimension by one.
x = torch.randn(N*N -1 , 2, 2).cuda()
Environment
PyTorch version: 1.0.0.dev20181029
Is debug build: No
CUDA used to build PyTorch: 8.0.61OS: Ubuntu 16.04.4 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609
CMake version: version 3.11.0Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 9.1.85
GPU models and configuration:
GPU 0: GeForce GTX 1080
GPU 1: GeForce GTX 1080Nvidia driver version: 387.26
cuDNN version: Probably one of the following:
/usr/local/MATLAB/R2016b/bin/glnxa64/libcudnn.so.4.0.7
/usr/local/cuda-9.1/lib64/libcudnn.so
/usr/local/cuda-9.1/lib64/libcudnn.so.7
/usr/local/cuda-9.1/lib64/libcudnn.so.7.0.5
/usr/local/cuda-9.1/lib64/libcudnn.so.7.1.1
/usr/local/cuda-9.1/lib64/libcudnn_static.aVersions of relevant libraries:
[pip] Could not collect
[conda] cuda80 1.0 0 soumith
[conda] cuda91 1.0 h4c16780_0 pytorch
[conda] guided-filter-pytorch 1.1.1
[conda] magma-cuda91 2.3.0 1 pytorch
[conda] pytorch-nightly 1.0.0.dev20181029 py3.6_cuda8.0.61_cudnn7.1.2_0 [cuda80] pytorch
[conda] torch 0.4.0
[conda] torch 0.3.1