Skip to content

Error with encountering NaNs in dlc_pytorch #2703

@AnnaStuckert

Description

@AnnaStuckert

Is there an existing issue for this?

  • I have searched the existing issues

Bug description

When running model training using the dlc_pytorch branch, I got the following error:

Traceback (most recent call last):
  File "/media1/data/anna/script_facemap_full.py", line 61, in <module>
    main()
  File "/media1/data/anna/script_facemap_full.py", line 45, in main
    deeplabcut.train_network(config, shuffle=shuffle) #step 2 - to edit training parameters such as max epochs to run, edit thepytorch_config.yaml file under dlc-models
  File "/media1/data/anna/DeepLabCut/deeplabcut/compat.py", line 245, in train_network
    return train_network(
  File "/media1/data/anna/DeepLabCut/deeplabcut/pose_estimation_pytorch/apis/train.py", line 339, in train_network
    train(
  File "/media1/data/anna/DeepLabCut/deeplabcut/pose_estimation_pytorch/apis/train.py", line 189, in train
    runner.fit(
  File "/media1/data/anna/DeepLabCut/deeplabcut/pose_estimation_pytorch/runners/train.py", line 177, in fit
    valid_loss = self._epoch(
  File "/media1/data/anna/DeepLabCut/deeplabcut/pose_estimation_pytorch/runners/train.py", line 217, in _epoch
    losses_dict = self.step(batch, mode)
  File "/media1/data/anna/DeepLabCut/deeplabcut/pose_estimation_pytorch/runners/train.py", line 321, in step
    self.logger.log_images(batch, outputs, target, step=self.current_epoch)
  File "/media1/data/anna/DeepLabCut/deeplabcut/pose_estimation_pytorch/runners/logger.py", line 326, in log_images
    images = self._prepare_images(inputs, outputs, targets)
  File "/media1/data/anna/DeepLabCut/deeplabcut/pose_estimation_pytorch/runners/logger.py", line 227, in _prepare_images
    image_logs[f"{base}.input"] = self._prepare_image(
  File "/media1/data/anna/DeepLabCut/deeplabcut/pose_estimation_pytorch/runners/logger.py", line 198, in _prepare_image
    image = draw_keypoints(
  File "/media1/data/anna/miniconda/envs/dlc/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/media1/data/anna/miniconda/envs/dlc/lib/python3.10/site-packages/torchvision/utils.py", line 427, in draw_keypoints
    draw.ellipse([x1, y1, x2, y2], fill=colors, outline=None, width=0)
  File "/media1/data/anna/miniconda/envs/dlc/lib/python3.10/site-packages/PIL/ImageDraw.py", line 223, in ellipse
    self.draw.draw_ellipse(xy, fill_ink, 1)
ValueError: x1 must be greater than or equal to x0

Currently, the temporary fix is this in logger.py:

def _prepare_image(
    self,
    image: torch.Tensor,
    denormalize: bool = False,
    keypoints: torch.Tensor | None = None,
    bboxes: torch.Tensor | None = None,
) -> np.ndarray:
    """
    Args:
        image: the image to log, of shape (C, H, W), of any data type
        denormalize: whether to remove ImageNet channel normalization
        keypoints: size (num_instances, K, 2) the K keypoints location
        bboxes: size (N, 4) containing bboxes in (xmin, ymin, xmax, ymax)

    Returns:
        an uint8 array with keypoints and bounding boxes drawn
    """
    if denormalize:
        image = self._denormalize(image.unsqueeze(0)).squeeze()

    image = F.convert_image_dtype(image.detach().cpu(), dtype=torch.uint8)
    if keypoints is not None and len(keypoints) > 0:
        assert len(keypoints.shape) == 3
        #keypoints[keypoints < 0] = np.nan
        keypoints[torch.isnan(keypoints).any(dim=2)] = 0

        print('what we put into draw keypoints ', keypoints[..., :2]) 
        image = draw_keypoints(
            image, keypoints=keypoints[..., :2], colors="red", radius=5
        )

    if bboxes is not None and len(bboxes) > 0:
        assert len(bboxes.shape) == 2
        image = draw_bounding_boxes(image, boxes=bboxes[:, :4], width=1)

    return image.permute(1, 2, 0).numpy()

Operating System

operating system: macOS Sonoma 14.5

DeepLabCut version

dlc version 3.0.0rc3.

DeepLabCut mode

single animal

Device type

gpu 2x Titan RTX

Steps To Reproduce

No response

Relevant log output

Traceback (most recent call last):
  File "/media1/data/anna/script_facemap_full.py", line 61, in <module>
    main()
  File "/media1/data/anna/script_facemap_full.py", line 45, in main
    deeplabcut.train_network(config, shuffle=shuffle) #step 2 - to edit training parameters such as max epochs to run, edit thepytorch_config.yaml file under dlc-models
  File "/media1/data/anna/DeepLabCut/deeplabcut/compat.py", line 245, in train_network
    return train_network(
  File "/media1/data/anna/DeepLabCut/deeplabcut/pose_estimation_pytorch/apis/train.py", line 339, in train_network
    train(
  File "/media1/data/anna/DeepLabCut/deeplabcut/pose_estimation_pytorch/apis/train.py", line 189, in train
    runner.fit(
  File "/media1/data/anna/DeepLabCut/deeplabcut/pose_estimation_pytorch/runners/train.py", line 177, in fit
    valid_loss = self._epoch(
  File "/media1/data/anna/DeepLabCut/deeplabcut/pose_estimation_pytorch/runners/train.py", line 217, in _epoch
    losses_dict = self.step(batch, mode)
  File "/media1/data/anna/DeepLabCut/deeplabcut/pose_estimation_pytorch/runners/train.py", line 321, in step
    self.logger.log_images(batch, outputs, target, step=self.current_epoch)
  File "/media1/data/anna/DeepLabCut/deeplabcut/pose_estimation_pytorch/runners/logger.py", line 326, in log_images
    images = self._prepare_images(inputs, outputs, targets)
  File "/media1/data/anna/DeepLabCut/deeplabcut/pose_estimation_pytorch/runners/logger.py", line 227, in _prepare_images
    image_logs[f"{base}.input"] = self._prepare_image(
  File "/media1/data/anna/DeepLabCut/deeplabcut/pose_estimation_pytorch/runners/logger.py", line 198, in _prepare_image
    image = draw_keypoints(
  File "/media1/data/anna/miniconda/envs/dlc/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/media1/data/anna/miniconda/envs/dlc/lib/python3.10/site-packages/torchvision/utils.py", line 427, in draw_keypoints
    draw.ellipse([x1, y1, x2, y2], fill=colors, outline=None, width=0)
  File "/media1/data/anna/miniconda/envs/dlc/lib/python3.10/site-packages/PIL/ImageDraw.py", line 223, in ellipse
    self.draw.draw_ellipse(xy, fill_ink, 1)
ValueError: x1 must be greater than or equal to x0

Anything else?

No response

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions