-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
🐛 Bug
Shuffling a huge tensor by reindexing into it results in all of the values past a certain point being set to 0 when the reindexing is done in GPU memory.
@colesbury This likely relates to #20562 as the issue is also with reindexing and also shows up at the same point in the tensor. Hopefully this helps with narrowing the other issue.
I initially thought this was limited to longtensors, but if you use a floattensor with 2x values you run into the exact same issue in the exact same place.
To Reproduce
Steps to reproduce the behavior:
- Create a huge longtensor (>536870700 values) of random values. I went with 15Mx45
- Load it into gpu memory
- Reindex to shuffle the values
Data past ~536870700 will be all 0. The data is zero'd mid tensor.
Here's a notebook that reproduces the issue:
https://github.com/EvenOldridge/HugeTensor/blob/master/Huge%20Tensor%20Bug%20-%20Data%20Loss.ipynb
Expected behavior
Data should not be lost when reindexing
Environment
- PyTorch Version (e.g., 1.0): 1.01
- OS (e.g., Linux): Linux (Ubuntu 16.04)
- How you installed PyTorch (conda, pip, source): pip
- Python version: 3.6
- CUDA/cuDNN version: 10.0
- GPU models and configuration: TeslaV100 32 Gig (on a DGX-1)