-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Hello there, everyone. :)
Apache Airflow version: 1.10.9, 1.10.10, trunk
- OS (e.g. from /etc/os-release): Linux
- Others: Bash/sh
What happened:
Password masking was added to SparkSubmitOperator (SparkSubmitHook, to be precise) in December 2019 (under AIRFLOW-6350; PR: #6917) - but it only masks passwords as long as they are in the --foo.password='value' form; i.e. it must be put in single-quotes and be joined with the argument's name via an equal sign.
What you expected to happen:
I would expect the forms a) with double-quotes or with no quotes at all b) with whitespace instead of an equal sign to also be covered by this mechanism, e.g.
--foo.password=value--foo.password="value"--foo.password 'value'--foo.password value--foo.password "value"
But I may be missing something. Is there any reason the initial version only covers the single-quoted-with-equal-sign form? The regular expression used in the masking code (1.10.9 version, trunk version) looks pretty intentional:
def _mask_cmd(self, connection_cmd):
# Mask any password related fields in application args with key value pair
# where key contains password (case insensitive), e.g. HivePassword='abc'
connection_cmd_masked = re.sub(
r"(\S*?(?:secret|password)\S*?\s*=\s*')[^']*(?=')",
r'\1******', ' '.join(connection_cmd), flags=re.I)How to reproduce it:
from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator # Airflow 1.10.9
dag = DAG(...)
SparkSubmitOperator(
...,
conf={"spark.foo.password": "this_should_get_masked_but_it_doesnt"},
dag=dag,
)Running such a task will leak the password into Airflow logs.
Anything else we need to know:
Again, I may be missing something, e.g. sth OS-specific. I'd be happy to learn something here. :)
In case all/part of the other forms I mentioned should also get the masking treatment, I have a change ready for opening a PR.
(Note there's no JIRA issue referenced in the commit messages: I cannot create issues in Airflow's Jira for some reason)