Prerequisites
Bug Description
When running deploy.sh after having generated a bundle from a recipe (specifically CUJ2), I had some issues on deploying:
- first deploy failed on a context deadline exceeded on CRD application
- second deploy attempted to do it idempotently, but ran into an issue with the webhook configurations that is being fixed
- third deploy failed but was prefaced by an undeploy.sh but the undeploy left some cert-manager CRDs around due to some resource policies
- fourth deploy failed prefaced by an undeploy.sh that succeeded
- fifth deploy succeeded
Would be nice if we had better retry mechanisms on this as the recovery is to just undeploy and deploy which is not ideal
Impact
Low (minor issue)
Component
Other / Unknown
Regression?
Yes, this worked before (please specify version below)
Steps to Reproduce
CUJ2.md
Expected Behavior
I expect it flake less or have enough perpetual retry mechanisms to eventually succeed
Actual Behavior
It flaked and didn't succeed for a bit.
Environment
- AICR version (CLI
aicr version, API image tag, or commit SHA):
- Install method (release binary / build from source / container image):
- Platform (eks/gke/aks/self-managed):
- Kubernetes version:
- OS (ubuntu/cos/other) + version:
- Kernel version:
- GPU type (h100/gb200/a100/l40/other):
- Workload intent (training/inference):
Command / Request Used
No response
Logs / Error Output
Additional Context
No response
Prerequisites
Bug Description
When running deploy.sh after having generated a bundle from a recipe (specifically CUJ2), I had some issues on deploying:
Would be nice if we had better retry mechanisms on this as the recovery is to just undeploy and deploy which is not ideal
Impact
Low (minor issue)
Component
Other / Unknown
Regression?
Yes, this worked before (please specify version below)
Steps to Reproduce
CUJ2.md
Expected Behavior
I expect it flake less or have enough perpetual retry mechanisms to eventually succeed
Actual Behavior
It flaked and didn't succeed for a bit.
Environment
aicr version, API image tag, or commit SHA):Command / Request Used
No response
Logs / Error Output
Additional Context
No response