Skip to content

Missing cuda libraries? #2115

@laurelrr

Description

@laurelrr

Is there an existing issue for this?

  • I have searched the existing issues

Bug description

I installed DLC using the conda environment provided but received an error when trying to run it about missing tensorflow libraries.

Operating System

operating system: Debian Buster v10.X.
CPU Count: 64 : "AuthenticAMD AMD EPYC 7513 32-Core Processor 1799.602 MHz (2 chips x 32 cores)"

DeepLabCut version

dlc version: appears to be trying to load DLC 2.3.0

DeepLabCut mode

single animal

Device type

GPU: A40, NVIDIA-SMI 460.106.00 Driver Version: 460.106.00 CUDA Version: 11.2

Steps To Reproduce

I followed the installation instructions for conda on Linux platform as follows:
git clone https://github.com/DeepLabCut/DeepLabCut.git
cd DeepLabCut/conda-environments
conda env create -f DEEPLABCUT.yaml

It appeared to install correctly with no errors.
conda activate DEEPLABCUT
but
python -m deeplabcut
gave the error (below)

Relevant log output

2023-01-09 14:41:15.353074: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-09 14:41:16.169473: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-01-09 14:41:21.432570: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:
2023-01-09 14:41:21.433516: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:
2023-01-09 14:41:21.433525: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Loading DLC 2.3.0...
Traceback (most recent call last):
  File "/home/lkeyes/anaconda3/envs/DEEPLABCUT/lib/python3.8/site-packages/torch/__init__.py", line 172, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/home/lkeyes/anaconda3/envs/DEEPLABCUT/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/lkeyes/anaconda3/envs/DEEPLABCUT/lib/python3.8/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: symbol cublasLtGetStatusString version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/lkeyes/anaconda3/envs/DEEPLABCUT/lib/python3.8/runpy.py", line 185, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/home/lkeyes/anaconda3/envs/DEEPLABCUT/lib/python3.8/runpy.py", line 144, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/home/lkeyes/anaconda3/envs/DEEPLABCUT/lib/python3.8/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/nadata/snlkt/home/lkeyes/Projects/GIT/DeepLabCut/deeplabcut/__init__.py", line 76, in <module>
    from deeplabcut.pose_tracking_pytorch import transformer_reID
  File "/nadata/snlkt/home/lkeyes/Projects/GIT/DeepLabCut/deeplabcut/pose_tracking_pytorch/__init__.py", line 14, in <module>
    from .train_dlctransreid import train_tracking_transformer
  File "/nadata/snlkt/home/lkeyes/Projects/GIT/DeepLabCut/deeplabcut/pose_tracking_pytorch/train_dlctransreid.py", line 15, in <module>
    import torch
  File "/home/lkeyes/anaconda3/envs/DEEPLABCUT/lib/python3.8/site-packages/torch/__init__.py", line 217, in <module>
    _load_global_deps()
  File "/home/lkeyes/anaconda3/envs/DEEPLABCUT/lib/python3.8/site-packages/torch/__init__.py", line 178, in _load_global_deps
    _preload_cuda_deps()
  File "/home/lkeyes/anaconda3/envs/DEEPLABCUT/lib/python3.8/site-packages/torch/__init__.py", line 158, in _preload_cuda_deps
    ctypes.CDLL(cublas_path)
  File "/home/lkeyes/anaconda3/envs/DEEPLABCUT/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/lkeyes/anaconda3/envs/DEEPLABCUT/lib/python3.8/site-packages/nvidia/cublas/lib/libcublas.so.11: symbol cublasLtGetStatusString version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

Anything else?

This is the first time I am trying out DeepLabCut. My apologies in advance if there is an obvious fix for this that I missed. Thanks for the help!

Code of Conduct

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions