Skip to content

EMR URI Parse Fails to resolve nested dict objects #13934

@jonasrla

Description

@jonasrla

Apache Airflow version:
1.10.12

Environment:

  • Cloud provider or hardware configuration:
    • MacBook Pro Mic 2014
  • OS (e.g. from /etc/os-release):
    • macOS Big Sur 11.1
  • Kernel (e.g. uname -a):
    • 20.2.0 Darwin Kernel Version 20.2.0: Wed Dec 2 20:39:59 PST 2020; root:xnu-7195.60.75~1/RELEASE_X86_64 x86_64

What happened:

I was trying to setup an EMR connection using the Environment Variable feature, but found out there are limitations. While parsing a config that looks like this

{
	"Instances": {
		"MasterInstanceType": "m5.xlarge",
		"SlaveInstanceType": "m5.xlarge",
		"InstanceCount": 2,
		"Ec2SubnetId": "subnet-XXXXXXXXXXXXXXXXX"
	},
	"ServiceRole": "EMR_DefaultRole",
	"JobFlowRole": "EMR_EC2_DefaultRole",
	"ReleaseLabel": "emr-5.32.0",
	"Applications": [{
		"Name": "Spark"
	}],
	"Configurations": [{
		"Classification": "spark-hive-site",
		"Properties": {
			"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
		}
	}]
}

I got this
Screen Shot 2021-01-27 at 18 41 26

What you expected to happen:

I expected the json were completely parsed

How to reproduce it:

Export some EMR connection in your environment such as

AIRFLOW_CONN_EMR_ALT='emr://?Instances=%7B%27MasterInstanceType%27%3A+%27m5.xlarge%27%2C+%27SlaveInstanceType%27%3A+%27m5.xlarge%27%2C+%27InstanceCount%27%3A+2%2C+%27Ec2SubnetId%27%3A+%27subnet-XXXXXXXXXXXXXXXXX%27%7D&ServiceRole=EMR_DefaultRole&JobFlowRole=EMR_EC2_DefaultRole&ReleaseLabel=emr-5.32.0&Applications=%5B%7B%27Name%27%3A+%27Spark%27%7D%5D&Configurations=%5B%7B%27Classification%27%3A+%27spark-hive-site%27%2C+%27Properties%27%3A+%7B%27hive.metastore.client.factory.class%27%3A+%27com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory%27%7D%7D%5D'

and try to connect with emr_alt connection

Anything else we need to know:

I already located the line:
https://github.com/apache/airflow/blob/1.10.12/airflow/models/connection.py#L148
https://github.com/apache/airflow/blob/2.0.0/airflow/models/connection.py#L164

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:corekind:bugThis is a clearly a bugpriority:mediumBug that should be fixed before next release but would not block a release

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions