Skip to content

Unable to specify Python version for AwsGlueJobOperator #20832

@jfamestad

Description

@jfamestad

Apache Airflow Provider(s)

amazon

Versions of Apache Airflow Providers

No response

Apache Airflow version

2.0.2

Operating System

Amazon Linux

Deployment

MWAA

Deployment details

No response

What happened

When a new Glue job is created using the AwsGlueJobOperator, the job is defaulting to Python2. Setting the version in create_job_kwargs fails with key error.

What you expected to happen

Expected the Glue job to be created with a Python3 runtime. create_job_kwargs are passed to the boto3 glue client create_job method which includes a "Command" parameter that is a dictionary containing the Python version.

How to reproduce

Create a dag with an AwsGlueJobOperator and pass a "Command" parameter in the create_job_kwargs argument.

    create_glue_job_args = {
        "Command": {
            "Name": "abalone-preprocess",
            "ScriptLocation": f"s3://{output_bucket}/code/preprocess.py",
            "PythonVersion": "3"
        }
    }
    glue_etl = AwsGlueJobOperator(  
        task_id="glue_etl",  
        s3_bucket=output_bucket,
        script_args={
                '--S3_INPUT_BUCKET': data_bucket,
                '--S3_INPUT_KEY_PREFIX': 'input/raw',
                '--S3_UPLOADS_KEY_PREFIX': 'input/uploads',
                '--S3_OUTPUT_BUCKET': output_bucket,
                '--S3_OUTPUT_KEY_PREFIX': str(determine_dataset_id.output) +'/input/data' 
            },
        iam_role_name="MLOps",  
        retry_limit=2,
        concurrent_run_limit=3,
        create_job_kwargs=create_glue_job_args,
        dag=dag) 
[2022-01-04 16:43:42,053] {{logging_mixin.py:104}} INFO - [2022-01-04 16:43:42,053] {{glue.py:190}} ERROR - Failed to create aws glue job, error: 'Command'
[2022-01-04 16:43:42,081] {{logging_mixin.py:104}} INFO - [2022-01-04 16:43:42,081] {{glue.py:112}} ERROR - Failed to run aws glue job, error: 'Command'
[2022-01-04 16:43:42,101] {{taskinstance.py:1482}} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 166, in get_or_create_glue_job
    get_job_response = glue_client.get_job(JobName=self.job_name)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 676, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.EntityNotFoundException: An error occurred (EntityNotFoundException) when calling the GetJob operation: Job with name: abalone-preprocess not found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1138, in _run_raw_task
    self._prepare_and_execute_task_with_callbacks(context, task)
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
    result = self._execute_task(context, task_copy)
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/operators/glue.py", line 121, in execute
    glue_job_run = glue_job.initialize_job(self.script_args)
  File "/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 108, in initialize_job
    job_name = self.get_or_create_glue_job()
  File "/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 186, in get_or_create_glue_job
    **self.create_job_kwargs,
KeyError: 'Command'

Anything else

When a new job is being created.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions