-
Notifications
You must be signed in to change notification settings - Fork 7.2k
GeneralizedRCNN returns NaNs with torch.uint8 inputs #3228
Description
🐛 Bug
The FasterRCNN model (and, more generally, the GeneralizedRCNN class) expects as input images a list of float PyTorch tensors, but if you try to pass it a list of tensors with dtype torch.uint8, the model returns NaN values in the normalization step and, as a consequence, in the losses computation.
To Reproduce
Steps to reproduce the behavior:
- Load an image as a PyTorch tensor with dtype
torch.uint8, along with its corresponding target dictionary - Create an instance of
FasterRCNNand pass that image to the model - Observe the output of the model, which should be the dictionary of losses with all
NaNvalues
Expected behavior
I would have expected the model to throw an exception or at least a warning. In particular, since the GeneralizedRCNN class takes care of transformations such as normalization and resizing, in my opinion it should also check the type of the input images, in order to avoid such errors.
Environment
PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 10.15.7 (x86_64)
GCC version: Could not collect
Clang version: 12.0.0 (clang-1200.0.32.28)
CMake version: version 3.18.4
Python version: 3.8 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.19.4
[pip3] torch==1.7.1
[pip3] torchvision==0.8.2
[conda] Could not collect
Additional context
I realized that the error I was facing is caused by the normalize function of the GeneralizedRCNNTransform class, which relies on the image dtype to convert the mean and standard deviation lists to tensors, so that in the default case (ImageNet mean/std) they contain all zeros.
def normalize(self, image):
dtype, device = image.dtype, image.device
mean = torch.as_tensor(self.image_mean, dtype=dtype, device=device)
std = torch.as_tensor(self.image_std, dtype=dtype, device=device)
return (image - mean[:, None, None]) / std[:, None, None] To avoid this problem, a simple image.float() would suffice.