Skip to content

Sagemaker Transform Job fails if there are job with Same name #21941

@hsrocks

Description

@hsrocks

Description

Sagemaker Transform Job fails if there are job with Same name exist. Let say I create a job name as 'transform-2021-01-01T00-30-00' . So if I clear the airflow task run id for this so that the operator re-triggers then the Sagemaker Job creation fails because job with same name exists. So can we add 'action_if_job_exists flag where Behaviour if the job name already exists. Possible options are "increment" (default) and "fail".'

Use case/motivation

Now in production environment failures are inevitable and with Sagemaker Jobs we have to ensure there is unique name for each run of the Job. So like the Sagemaker Processing Job operator or training operator we have an option to increment a job name by appending the count like if I run same job twice the job name will be 'transform-2021-01-01T00-30-00-1' where 1 is appended at end with the help of 'action_if_job_exists (str) -- Behaviour if the job name already exists. Possible options are "increment" (default) and "fail".'

I have faced this issue personally on one of the task I am working on and think will save time and cost instead of running entire workflow again to get unique job names if there are other dependent task in the job by just clearing failed task id post fixing the failure in Sagemaker code , docker image input etc and that will just continue from where it failed

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions