Skip to content

FasterRCNN: TypeError: target labels must of int64 type, instead got torch.int32 #2642

@Kpraetori

Description

@Kpraetori

Is there an existing issue for this?

  • I have searched the existing issues

Bug description

I'm trying to do transfer learning with the Superanimal Quadruped model but I keep getting a couple errors and it won't train. Maybe it's an install issue but I'm not sure and don't know how to resolve it.

  1. it throws a notice that I set a batch size of 1 or freeze bn stats = false, neither of which is accurate according to both config files. The config files are attached as text files (won't let me upload .yaml).
  2. When it begins start object detector training, a series of traceback errors are thrown and it does not train. I'm attaching the log output.

I have tried on two different servers and rebuilt the environment several times. I have verified nvidia-smi shows the GPU and that torch.cuda.is_available shows true in the environment. I have tried with the latest version of DLC. I have also tried with Pytorch 12.1 and 12.4 in case a newer version was needed. I have followed the regular installation guide in the docs and also the instructions at #2613 to no avail.

I am coming up to a submission deadline so help is really appreciated!

Complete_Log_DLC_Failing.txt
config.txt
pose_cfg.txt
pytorch_config.txt

Operating System

Windows 2022 Server

DeepLabCut version

deeplabcut-3.0.0rc1

DeepLabCut mode

single animal

Device type

gpu, NVIDIA RTX A4000. I also tried a different server with NVIDIA A10.

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 537.42                 Driver Version: 537.42       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A4000             WDDM  | 00000000:21:00.0 Off |                  Off |
| 41%   30C    P8               9W / 140W |    508MiB / 16376MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      4872    C+G   C:\Windows\explorer.exe                   N/A      |
|    0   N/A  N/A      8696    C+G   ...cal\Microsoft\OneDrive\OneDrive.exe    N/A      |
|    0   N/A  N/A      9176    C+G   ...m Files\Mozilla Firefox\firefox.exe    N/A      |
|    0   N/A  N/A     10988    C+G   ...crosoft\Edge\Application\msedge.exe    N/A      |
|    0   N/A  N/A     11060    C+G   ...CBS_cw5n1h2txyewy\TextInputHost.exe    N/A      |
|    0   N/A  N/A     12368    C+G   ...ekyb3d8bbwe\PhoneExperienceHost.exe    N/A      |
|    0   N/A  N/A     17328    C+G   ....Search_cw5n1h2txyewy\SearchApp.exe    N/A      |
+---------------------------------------------------------------------------------------+

Steps To Reproduce

Relevant log output

Note: According to your model configuration, you're training with batch size 1 and/or ``freeze_bn_stats=false``. This is not an optimal setting if you have powerful GPUs.
This is good for small batch sizes (e.g., when training on a CPU), where you should keep ``freeze_bn_stats=true``.
If you're using a GPU to train, you can obtain faster performance by setting a larger batch size (the biggest power of 2 where you don't geta CUDA out-of-memory error, such as 8, 16, 32 or 64 depending on the model, size of your images, and GPU memory) and ``freeze_bn_stats=false`` for the backbone of your model.
This also allows you to increase the learning rate (empirically you can scale the learning rate by sqrt(batch_size) times).

Using 628 images and 157 for testing

Starting object detector training...
--------------------------------------------------
Traceback (most recent call last):
  File "C:\Other_Program_Files\miniforge3\envs\deeplabcut3\lib\site-packages\deeplabcut\gui\tabs\train_network.py", line 190, in train_network
    compat.train_network(config, shuffle, **kwargs)
  File "C:\Other_Program_Files\miniforge3\envs\deeplabcut3\lib\site-packages\deeplabcut\compat.py", line 245, in train_network
    return train_network(
  File "C:\Other_Program_Files\miniforge3\envs\deeplabcut3\lib\site-packages\deeplabcut\pose_estimation_pytorch\apis\train.py", line 326, in train_network
    train(
  File "C:\Other_Program_Files\miniforge3\envs\deeplabcut3\lib\site-packages\deeplabcut\pose_estimation_pytorch\apis\train.py", line 189, in train
    runner.fit(
  File "C:\Other_Program_Files\miniforge3\envs\deeplabcut3\lib\site-packages\deeplabcut\pose_estimation_pytorch\runners\train.py", line 170, in fit
    train_loss = self._epoch(
  File "C:\Other_Program_Files\miniforge3\envs\deeplabcut3\lib\site-packages\deeplabcut\pose_estimation_pytorch\runners\train.py", line 221, in _epoch
    losses_dict = self.step(batch, mode)
  File "C:\Other_Program_Files\miniforge3\envs\deeplabcut3\lib\site-packages\deeplabcut\pose_estimation_pytorch\runners\train.py", line 503, in step
    losses, predictions = self.model(images, target)
  File "C:\Other_Program_Files\miniforge3\envs\deeplabcut3\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Other_Program_Files\miniforge3\envs\deeplabcut3\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Other_Program_Files\miniforge3\envs\deeplabcut3\lib\site-packages\deeplabcut\pose_estimation_pytorch\models\detectors\fasterRCNN.py", line 106, in forward
    return self.model(x, targets)
  File "C:\Other_Program_Files\miniforge3\envs\deeplabcut3\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Other_Program_Files\miniforge3\envs\deeplabcut3\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Other_Program_Files\miniforge3\envs\deeplabcut3\lib\site-packages\torchvision\models\detection\generalized_rcnn.py", line 105, in forward
    detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
  File "C:\Other_Program_Files\miniforge3\envs\deeplabcut3\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Other_Program_Files\miniforge3\envs\deeplabcut3\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Other_Program_Files\miniforge3\envs\deeplabcut3\lib\site-packages\torchvision\models\detection\roi_heads.py", line 749, in forward
    raise TypeError(f"target labels must of int64 type, instead got {t['labels'].dtype}")
TypeError: target labels must of int64 type, instead got torch.int32

Anything else?

No response

Code of Conduct

Metadata

Metadata

Assignees

Labels

DLC3.0🔥WORK IN PROGRESS!developers are currently working on this feature... stay tuned.bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions