Fix illegal memory access with multi_tensor_apply size above INT_MAX #1825

gdb · 2024-08-13T19:07:35Z

Currently, multi_tensor_apply causes an illegal memory access due to an overflow in the sizes field of TensorListMetadata. This can be reproduced using the following standalone script:

import torch, amp_C
from apex.multi_tensor_apply import multi_tensor_applier
multi_tensor_adam = amp_C.multi_tensor_adam

size = 2**32+1
g_32 = [torch.zeros(size, dtype=torch.float32, device='cuda')]
p_32 = [torch.zeros(size, dtype=torch.float32, device='cuda')]
m_32 = [torch.zeros(size, dtype=torch.float32, device='cuda')]
v_32 = [torch.zeros(size, dtype=torch.float32, device='cuda')]
_dummy_overflow_buf = torch.zeros(1, dtype=torch.int32, device='cuda')

multi_tensor_applier(multi_tensor_adam, _dummy_overflow_buf, [g_32, p_32, m_32, v_32], 0.0, 0.9, 0.95, 1e-08, 1, 1, 1, 0.1)
print(g_32)

Currently, multi_tensor_apply causes an illegal memory access due to an overflow in the `size` field of `TensorListMetadata`. This can be reproduced using the following standalone script: ```python import torch, amp_C from apex.multi_tensor_apply import multi_tensor_applier multi_tensor_adam = amp_C.multi_tensor_adam size = 2**32+1 g_32 = [torch.zeros(size, dtype=torch.float32, device='cuda')] p_32 = [torch.zeros(size, dtype=torch.float32, device='cuda')] m_32 = [torch.zeros(size, dtype=torch.float32, device='cuda')] v_32 = [torch.zeros(size, dtype=torch.float32, device='cuda')] _dummy_overflow_buf = torch.zeros(1, dtype=torch.int32, device='cuda') multi_tensor_applier(multi_tensor_adam, _dummy_overflow_buf, [g_32, p_32, m_32, v_32], 0.0, 0.9, 0.95, 1e-08, 1, 1, 1, 0.1) print(g_32) ```

awgu · 2024-08-13T23:27:50Z

cc: @crcrpar are the following out of date:

apex/csrc/multi_tensor_apply.cuh

Lines 15 to 17 in b3bd26a

    
           // TODO:  Kernel arg size limit may be <4KB for some other cards (ie Jetson) 
        
           constexpr int depth_to_max_tensors[6] = {110, 64, 48, 36, 30, 24}; 
        
           constexpr int depth_to_max_blocks[6] = {320, 320, 320, 320, 320, 320};

I see the same limits in PyTorch where you already updated to use int64_t in pytorch/pytorch#101760. Otherwise, I would expect that changing to use int64_t increases the TensorListMetadata struct size and hence the kernel arg size.

(Though, it seems that CUDA 12.1 on Volta+ increased the kernel arg size limit from 4 KB to 32 KB.)

crcrpar · 2024-08-17T05:39:10Z

I would expect that changing to use int64_t increases the TensorListMetadata struct size and hence the kernel arg size.

Yes, but apex does not have multi-tensor-apply with a list of scalars so we might be able to dodge a tweak of depth_to_max_tensors and depth_to_max_blocks

crcrpar

excuse my delay, thank you

…7639) This change fixes an overflow issue in `TensorListMetadata` where the `sizes` array used `int` (32-bit signed integer). This caused incorrect behavior (e.g., no parameter updates) when handling tensor sizes exceeding `INT_MAX` (2^31 - 1). The change here is identical to NVIDIA/apex PR [#1825](NVIDIA/apex#1825) for `multi_tensor_apply.cuh`. For further details regarding this fix, please refer to issue [#7640](#7640). Signed-off-by: Wang Yan <[email protected]>

crcrpar approved these changes Aug 17, 2024

View reviewed changes

crcrpar merged commit 79e3dc4 into NVIDIA:master Aug 17, 2024

This was referenced Oct 21, 2025

Fix illegal memory access with multi_tensor_apply size above INT_MAX deepspeedai/DeepSpeed#7639

Merged

[BUG] Integer overflow in FusedAdam silently prevents weight updates for large tensors (>= 2^31 elements) deepspeedai/DeepSpeed#7640

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix illegal memory access with multi_tensor_apply size above INT_MAX #1825

Fix illegal memory access with multi_tensor_apply size above INT_MAX #1825

gdb commented Aug 13, 2024 •

edited

Loading

Uh oh!

awgu commented Aug 13, 2024

Uh oh!

crcrpar commented Aug 17, 2024

Uh oh!

crcrpar left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix illegal memory access with multi_tensor_apply size above INT_MAX #1825

Fix illegal memory access with multi_tensor_apply size above INT_MAX #1825

Conversation

gdb commented Aug 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

awgu commented Aug 13, 2024

Uh oh!

crcrpar commented Aug 17, 2024

Uh oh!

crcrpar left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gdb commented Aug 13, 2024 •

edited

Loading