Skip to content

Conversation

@eternalNight
Copy link
Contributor

CUDA tensors may have a larger storage than numel() * dtype.itemsize due to alignment considerations. Creating dummy tensors by torch.zero().as_strided() leads to out-of-bound errors in such cases.

Create dummy inputs by empty_strided().zero_() instead.

CUDA tensors may have a larger storage than numel() * dtype.itemsize due
to alignment considerations. Creating dummy tensors by
torch.zero().as_strided() leads to out-of-bound errors in such cases.

Create dummy inputs by empty_strided().zero_() instead.

Signed-off-by: Junjie Mao <[email protected]>
@tohtana tohtana merged commit 660ee89 into deepspeedai:master Sep 15, 2025
19 of 24 checks passed
mauryaavinash95 pushed a commit to DataStates/DeepSpeed that referenced this pull request Oct 4, 2025
CUDA tensors may have a larger storage than numel() * dtype.itemsize due
to alignment considerations. Creating dummy tensors by
torch.zero().as_strided() leads to out-of-bound errors in such cases.

Create dummy inputs by empty_strided().zero_() instead.

Signed-off-by: Junjie Mao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants