Skip to content

Reindexing a huge tensor to shuffle it results in data loss at the end of the tensor #20888

@EvenOldridge

Description

@EvenOldridge

🐛 Bug

Shuffling a huge tensor by reindexing into it results in all of the values past a certain point being set to 0 when the reindexing is done in GPU memory.

@colesbury This likely relates to #20562 as the issue is also with reindexing and also shows up at the same point in the tensor. Hopefully this helps with narrowing the other issue.

I initially thought this was limited to longtensors, but if you use a floattensor with 2x values you run into the exact same issue in the exact same place.

To Reproduce

Steps to reproduce the behavior:

  1. Create a huge longtensor (>536870700 values) of random values. I went with 15Mx45
  2. Load it into gpu memory
  3. Reindex to shuffle the values

Data past ~536870700 will be all 0. The data is zero'd mid tensor.

Here's a notebook that reproduces the issue:
https://github.com/EvenOldridge/HugeTensor/blob/master/Huge%20Tensor%20Bug%20-%20Data%20Loss.ipynb

Expected behavior

Data should not be lost when reindexing

Environment

  • PyTorch Version (e.g., 1.0): 1.01
  • OS (e.g., Linux): Linux (Ubuntu 16.04)
  • How you installed PyTorch (conda, pip, source): pip
  • Python version: 3.6
  • CUDA/cuDNN version: 10.0
  • GPU models and configuration: TeslaV100 32 Gig (on a DGX-1)

Metadata

Metadata

Assignees

Labels

triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions