Skip to content

SparkSubmitOperator only masks one "form" of password arguments #9595

@Unit03

Description

@Unit03

Hello there, everyone. :)

Apache Airflow version: 1.10.9, 1.10.10, trunk

  • OS (e.g. from /etc/os-release): Linux
  • Others: Bash/sh

What happened:

Password masking was added to SparkSubmitOperator (SparkSubmitHook, to be precise) in December 2019 (under AIRFLOW-6350; PR: #6917) - but it only masks passwords as long as they are in the --foo.password='value' form; i.e. it must be put in single-quotes and be joined with the argument's name via an equal sign.

What you expected to happen:

I would expect the forms a) with double-quotes or with no quotes at all b) with whitespace instead of an equal sign to also be covered by this mechanism, e.g.

  • --foo.password=value
  • --foo.password="value"
  • --foo.password 'value'
  • --foo.password value
  • --foo.password "value"

But I may be missing something. Is there any reason the initial version only covers the single-quoted-with-equal-sign form? The regular expression used in the masking code (1.10.9 version, trunk version) looks pretty intentional:

    def _mask_cmd(self, connection_cmd):
        # Mask any password related fields in application args with key value pair
        # where key contains password (case insensitive), e.g. HivePassword='abc'

        connection_cmd_masked = re.sub(
            r"(\S*?(?:secret|password)\S*?\s*=\s*')[^']*(?=')",
            r'\1******', ' '.join(connection_cmd), flags=re.I)

How to reproduce it:

from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator  # Airflow 1.10.9

dag = DAG(...)
SparkSubmitOperator(
    ...,
    conf={"spark.foo.password": "this_should_get_masked_but_it_doesnt"},
    dag=dag,
)

Running such a task will leak the password into Airflow logs.

Anything else we need to know:

Again, I may be missing something, e.g. sth OS-specific. I'd be happy to learn something here. :)

In case all/part of the other forms I mentioned should also get the masking treatment, I have a change ready for opening a PR.

(Note there's no JIRA issue referenced in the commit messages: I cannot create issues in Airflow's Jira for some reason)

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind:bugThis is a clearly a bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions