Skip to content

3D Densenet does not support deterministic training in GPU mode #3716

@yiheng-wang-nv

Description

@yiheng-wang-nv

@holgerroth and I found that when we use GPU devices, using 3D DenseNet cannot do deterministic training.

I prepared a simple code block that can help to reproduce this issue:

import torch
import monai
from monai.utils import set_determinism
import os

# os.environ['CUBLAS_WORKSPACE_CONFIG'] = ':4096:8'
# torch.use_deterministic_algorithms(True)
set_determinism(seed=0)

loss_function = torch.nn.CrossEntropyLoss()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# device = "cpu"

spatial_dims = 3
model = monai.networks.nets.DenseNet121(
    spatial_dims=spatial_dims, in_channels=1, out_channels=2).to(device)
optimizer = torch.optim.Adam(model.parameters(), 1e-5)
model.train()

lb = torch.tensor([1, 0]).to(device)
all_loss = 0
shape = [2, 1, 64, 64, 64][:2+spatial_dims]
for i in range(10):

    inp = torch.randn(shape).to(device)
    optimizer.zero_grad()
    out = model(inp)
    loss = loss_function(out, lb)
    loss.backward()
    optimizer.step()
    
    all_loss += loss.item()
print(all_loss)

When device is gpu and spatial_dims = 3, running the above code twice will achieve different results.
In addition, when uncomment two lines before set_determinism(seed=0), you will see the related error, which shows that the 3D adaptive average pooling layer is non-deterministic when using GPU.

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions