Error deploying VSS Blueprint using kubernetes

I am following the instructions in Setup the Prerequisites — Video Search and Summarization Agent to install and try the VSS. But so far I have not been able to successfully deploy the application due to some errors (see the attached pod describe log).

Please help me troubleshooting this problem.

Thanks in advance.

log.txt (32.9 KB)

  • Hardware Platform 8 x H100
  • System Memory 1.97TB
  • Ubuntu Version 22.04.5 LTS
  • NVIDIA GPU Driver Version 560.35.03

Could you refer to our FAQ to get the detailed log?
We recommend that you install the driver version we recommend 535.161.08. This version has better compatibility.

Hello,

I have installed nvidia driver 535.161.08 and restart the server to make sure, but the problem seems to be the initialization of milvus. I am attaching the “pod describe” for reference.

“Pod was rejected: Allocate failed due to cannot allocate unregistered device nvidia.com/gpu, which is unexpected”

milvus_pod_describe.txt (3.0 KB)

Could you check if “nvidia-cuda-validator-” pod work properly in your env? If not, please refer to our FAQ to install the fabricmanage.

I have installed the nvidia-fabricmanager with the same version as the driver. The problem is now resolved. Thanks!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.