i following with the procedure “k8s-driver-manager Failed to unload nouveau driver” .
i Performed the bash setup.sh installation process.
when reboot OS, Ubuntu 20.04 has occurred loading screen stuck.
grub rescue to fix linux boot failure.
but restarting pod gpu-operator-7bfc5f55-8jx8r
$kubectl get pod -n nvidia-gpu-operator
NAME READY STATUS RESTARTS AGE
gpu-operator-1672814931-node-feature-discovery-master-b6f69h5fd 1/1 Running 8 (21m ago) 18h
gpu-operator-1672814931-node-feature-discovery-worker-fqx67 1/1 Running 8 (21m ago) 18h
gpu-operator-7bfc5f55-8jx8r 0/1 CrashLoopBackOff 5 (85s ago) 18m
and i attached pod log
| 1.672879965216436e+09 INFO controller-runtime.metrics Metrics server is starting to listen │
│ 1.672879965216878e+09 INFO setup starting manager │
│ 1.6728799652173266e+09 INFO Starting server {"kind": "health probe", "addr": ":8081"} │
│ 1.6728799652173545e+09 INFO Starting server {"path": "/metrics", "kind": "metrics", "addr": │
│ I0105 00:52:45.217405 1 leaderelection.go:248] attempting to acquire leader lease nvidia-gpu-o │
│ I0105 00:53:03.509532 1 leaderelection.go:258] successfully acquired lease nvidia-gpu-operator │
│ 1.672879983509869e+09 INFO controller.clusterpolicy-controller Starting EventSource {"so │
│ 1.6728799835099726e+09 INFO controller.clusterpolicy-controller Starting EventSource {"s │
│ 1.6728799835099897e+09 INFO controller.clusterpolicy-controller Starting EventSource {"s │
│ 1.6728799835099986e+09 INFO controller.clusterpolicy-controller Starting Controller │
│ 1.6728799835095918e+09 DEBUG events Normal {"object": {"kind":"ConfigMap","namespace":"n │
│ 1.6728799835100212e+09 DEBUG events Normal {"object": {"kind":"Lease","namespace":"nvidi │
│ 1.6728799837144682e+09 ERROR controller-runtime.source if kind is a CRD, it should be insta │
│ sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1 │
│ /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/source/source.go:137 │
│ k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext │
│ /workspace/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:233 │
│ k8s.io/apimachinery/pkg/util/wait.poll │
│ /workspace/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:580 │
│ k8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext │
│ /workspace/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:545 │
│ sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1 │
│ /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/source/source.go:131 │
│ I0105 00:53:04.861065 1 request.go:665] Waited for 1.040331054s due to client-side throttling, │
│ 1.6728799854137976e+09 ERROR controllers.ClusterPolicy Unable to list ClusterPolicies {" │
│ sigs.k8s.io/controller-runtime/pkg/handler.(*enqueueRequestsFromMapFunc).mapAndEnqueue