-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Description
Sagemaker Transform Job fails if there are job with Same name exist. Let say I create a job name as 'transform-2021-01-01T00-30-00' . So if I clear the airflow task run id for this so that the operator re-triggers then the Sagemaker Job creation fails because job with same name exists. So can we add 'action_if_job_exists flag where Behaviour if the job name already exists. Possible options are "increment" (default) and "fail".'
Use case/motivation
Now in production environment failures are inevitable and with Sagemaker Jobs we have to ensure there is unique name for each run of the Job. So like the Sagemaker Processing Job operator or training operator we have an option to increment a job name by appending the count like if I run same job twice the job name will be 'transform-2021-01-01T00-30-00-1' where 1 is appended at end with the help of 'action_if_job_exists (str) -- Behaviour if the job name already exists. Possible options are "increment" (default) and "fail".'
I have faced this issue personally on one of the task I am working on and think will save time and cost instead of running entire workflow again to get unique job names if there are other dependent task in the job by just clearing failed task id post fixing the failure in Sagemaker code , docker image input etc and that will just continue from where it failed
Related issues
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct