-
Notifications
You must be signed in to change notification settings - Fork 498
Description
/kind bug
What steps did you take and what happened:
Deploy an experiment which spec.objective.additionalMetricNames contains spec.objective.objectiveMetricName.
The experiment is created, but you can see nothing on the experiment UI, like below.
- You can see the log
GetLastConditionType failed: Experiment doesn't have any conditionin the Katib UI pod. - Also in katib controller, you can see the log
"admission webhook \"validator.experiment.katib.kubeflow.org\" denied the request: only spec.parallelTrialCount, spec.maxTrialCount and spec.maxFailedTrialCount are editable",.
After some investigation, we found that the experiment is created, but failed to update when the additionalMetricNames contains objectiveMetricName.
Detailed investigation below
This is because of the behavior of the mutating & validation webhook.
If additionalMetricNames contains objectiveMetricName, mutating webhook tries to add objectiveMetricName to metricStrategies like below.
- https://github.com/kubeflow/katib/blob/v0.13.0/pkg/apis/controller/experiments/v1beta1/experiment_defaults.go#L57-L59
- https://github.com/kubeflow/katib/blob/v0.13.0/pkg/apis/controller/experiments/v1beta1/experiment_defaults.go#L80-L90
So after the mutation, metric strategies look like below.
metricStrategies:
- name: Validation-accuracy
value: max
- name: Train-accuracy
value: max
- name: Validation-accuracy
value: maxAnother mutation happens when the controller update finalizer
However, it is not allowed to update metricsStrategies when oldInst is not nil
So the updating experiment is failed and status of experiment will be never updated
What did you expect to happen:
The experiment should not be created when additionalMetricNames contains objectiveMetricName.
Environment:
- Katib version (check the Katib controller image version): v0.13
Impacted by this bug? Give it a 👍 We prioritize the issues with the most 👍

