-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Closed
Labels
area:providerskind:bugThis is a clearly a bugThis is a clearly a bugprovider:amazonAWS/Amazon - related issuesAWS/Amazon - related issues
Description
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
No response
Apache Airflow version
2.0.2
Operating System
Amazon Linux
Deployment
MWAA
Deployment details
No response
What happened
When a new Glue job is created using the AwsGlueJobOperator, the job is defaulting to Python2. Setting the version in create_job_kwargs fails with key error.
What you expected to happen
Expected the Glue job to be created with a Python3 runtime. create_job_kwargs are passed to the boto3 glue client create_job method which includes a "Command" parameter that is a dictionary containing the Python version.
How to reproduce
Create a dag with an AwsGlueJobOperator and pass a "Command" parameter in the create_job_kwargs argument.
create_glue_job_args = {
"Command": {
"Name": "abalone-preprocess",
"ScriptLocation": f"s3://{output_bucket}/code/preprocess.py",
"PythonVersion": "3"
}
}
glue_etl = AwsGlueJobOperator(
task_id="glue_etl",
s3_bucket=output_bucket,
script_args={
'--S3_INPUT_BUCKET': data_bucket,
'--S3_INPUT_KEY_PREFIX': 'input/raw',
'--S3_UPLOADS_KEY_PREFIX': 'input/uploads',
'--S3_OUTPUT_BUCKET': output_bucket,
'--S3_OUTPUT_KEY_PREFIX': str(determine_dataset_id.output) +'/input/data'
},
iam_role_name="MLOps",
retry_limit=2,
concurrent_run_limit=3,
create_job_kwargs=create_glue_job_args,
dag=dag)
[2022-01-04 16:43:42,053] {{logging_mixin.py:104}} INFO - [2022-01-04 16:43:42,053] {{glue.py:190}} ERROR - Failed to create aws glue job, error: 'Command'
[2022-01-04 16:43:42,081] {{logging_mixin.py:104}} INFO - [2022-01-04 16:43:42,081] {{glue.py:112}} ERROR - Failed to run aws glue job, error: 'Command'
[2022-01-04 16:43:42,101] {{taskinstance.py:1482}} ERROR - Task failed with exception
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 166, in get_or_create_glue_job
get_job_response = glue_client.get_job(JobName=self.job_name)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 676, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.EntityNotFoundException: An error occurred (EntityNotFoundException) when calling the GetJob operation: Job with name: abalone-preprocess not found.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1138, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1311, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1341, in _execute_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/operators/glue.py", line 121, in execute
glue_job_run = glue_job.initialize_job(self.script_args)
File "/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 108, in initialize_job
job_name = self.get_or_create_glue_job()
File "/usr/local/lib/python3.7/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 186, in get_or_create_glue_job
**self.create_job_kwargs,
KeyError: 'Command'
Anything else
When a new job is being created.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
area:providerskind:bugThis is a clearly a bugThis is a clearly a bugprovider:amazonAWS/Amazon - related issuesAWS/Amazon - related issues