Skip to content

Conversation

@XD-DENG
Copy link
Member

@XD-DENG XD-DENG commented Aug 10, 2018

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:

In my earlier PRs, #3651 and #3729 , I proposed to generate random SECRET_KEY for the webserver (Flask App).

However, I realise that we may encounter CSRF error The CSRF session token is missing when we have multiple workers for the Flask webserver, since the secret_key is not consistent among workers.

On the other hand, it's still very important to have as random SECRET_KEY as possible for security reasons. We can deal with it like how we dealt with FERNET_KEY (i.e. generate a random value when the airflow.cfg file is initiated).

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.

Code Quality

  • Passes git diff upstream/master -u -- "*.py" | flake8 --diff

@XD-DENG
Copy link
Member Author

XD-DENG commented Aug 10, 2018

@feng-tao @Fokko PTAL.

We need to have random SECRET_KEY for Flask App, while it must be consistent among workers. So I propose to deal with it like how we dealt with FERNET_KEY.

Thanks.

@feng-tao
Copy link
Member

hey @XD-DENG , it seems that it will still be an issue if we have a cluster for webservers based on the comment in https://issues.apache.org/jira/browse/AIRFLOW-2866. What do you think? cc @Fokko , @kaxil as they comment / approve the original prs.

@XD-DENG
Copy link
Member Author

XD-DENG commented Aug 10, 2018

Hi @feng-tao , in cluster mode, given the .cfg files are initiated on different nodes, the SECRET_KEY will be different. That's also why Craig mentioned it fixes now on a single webserver instance (single worker or multiple workers), but still having CSRF error on a cluster of webservers (LINK).

I insist in that we should have random SECRET_KEY for the webserver. If we use the previous way, that is to have a default value in the template, it's gonna be risky.

If users want to use a cluster of webservers, they need to either

  • make sure that SECRET_KEYs are consistent across nodes (to manually specify). This is similar to what people need to do for sql_alchemy_conn if they're running multiple nodes for webserver.
    OR
  • apply ip-hash strategy for the cluster load-balancing, rather than round-robin.

Please let me know your thoughts?

cc: @Fokko @kaxil

@Fokko
Copy link
Contributor

Fokko commented Aug 12, 2018

Are people using clusters of webservers? I've never seen such a setup to be honest.

@XD-DENG
Copy link
Member Author

XD-DENG commented Aug 13, 2018

@Fokko @feng-tao :
If users are running a cluster for webservers, we can leave it to users to ensure configuration settings being homogeneous across the cluster (just like how users set sql_alchemy_conn or broker_url).

In a more common scenario, users have webserver on a single node (starting multiple processes as multiple workers), the solution in this PR should be able to improve the security in terms of SECRET_KEY without CSRF issue.

@XD-DENG XD-DENG force-pushed the patch-2 branch 3 times, most recently from 12950af to d6dc16d Compare August 13, 2018 06:47
The Flask SECRET_KEY should be as random as possible.

On the other hand, we can nott genrate random value when
we launch the webserver (the secret_key will be
inconsistent across the workers).

We can generate a random one in the configuration file
airflow.cfg, just like how we deal with FERNET_KEY.

The SECRET_KEY is generated using os.urandom, as
recommended by Flask community.
@codecov-io
Copy link

Codecov Report

Merging #3738 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3738      +/-   ##
==========================================
+ Coverage   77.66%   77.67%   +0.01%     
==========================================
  Files         204      204              
  Lines       15849    15846       -3     
==========================================
  Hits        12309    12309              
+ Misses       3540     3537       -3
Impacted Files Coverage Δ
airflow/www_rbac/app.py 97.8% <100%> (+1.02%) ⬆️
airflow/configuration.py 84.07% <100%> (+0.11%) ⬆️
airflow/www/app.py 100% <100%> (+0.98%) ⬆️
airflow/models.py 88.82% <0%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d2f83b...9e01476. Read the comment docs.

@feng-tao
Copy link
Member

feng-tao commented Aug 13, 2018

looking at this #3651 pr, I am wondering whether we could do better than use random function to assign the value to secret_key. I felt the better approach is to have a random key store in an internal key management service(this should be common for company, e.g lyft uses https://github.com/lyft/confidant) and assign it to secret_key in the config file. This will make sure the key is random but uniform across all the webservers. This won't cause any csrf issues as well. What do you think @Fokko , @kaxil , @XD-DENG ?

@feng-tao
Copy link
Member

feng-tao commented Aug 13, 2018

This pr currently can't address the issue when we have a cluster for webserver. IMO, I would prefer to revert the orignal prs and improve the secret_key config documentation. Let me know what you guys think.

@Noremac201
Copy link
Contributor

Noremac201 commented Aug 13, 2018

As far as I can tell, I can't save any passwords in Connections in Airflow webUI from a fresh master and this commit cherry-picked.

I don't know if I could've caused this though.

@XD-DENG
Copy link
Member Author

XD-DENG commented Aug 13, 2018

Hi @feng-tao , I agree that PRs #3651 and #3729 must be reverted as they are causing CSRF issues in web UI at this moment(they are also reverts in the commit of this PR).

On the other hand, for Airflow, I don’t think it’s common to run multiple nodes for webserver for a single Airflow instance. Normally people use multiple processes on a single machine as multiple workers for each single Airflow instance. Then the solution in this PR can provide out-of-the-box security improvement.

If you're talking about a cluster of Airflow instances, then Airflow instances do not need to have consistent secret_key, and they should not (different apps should use different secret_keys).

Please let me know your thoughts. Thanks.

CC @Fokko @kaxil

@ashb
Copy link
Member

ashb commented Aug 14, 2018

My vote is to go with this approach (secure by default and easy for the common case) plus a bit of documentation saying that you will want to make sure that this value is the same across multiple machines if you run more than one behind a load balancer etc.

@ashb
Copy link
Member

ashb commented Aug 14, 2018

To use an external secret store this could be extended to use the (already existing) _cmd functionality that exist in airflow for certain config options (sql alchemy connection instance) - we'd just need to add this option to the list that are read from cmds.

@feng-tao
Copy link
Member

sounds good @ashb . Let's go ahead to merge this pr to unblock master and document the functionality you mentioned later.

@feng-tao
Copy link
Member

thanks @XD-DENG for fixing the issue.

@feng-tao feng-tao merged commit f7602f8 into apache:master Aug 14, 2018
@XD-DENG XD-DENG deleted the patch-2 branch August 14, 2018 23:08
@XD-DENG
Copy link
Member Author

XD-DENG commented Aug 14, 2018

Thank you @feng-tao @ashb

lxneng pushed a commit to lxneng/incubator-airflow that referenced this pull request Aug 20, 2018
The Flask SECRET_KEY should be as random as possible.

On the other hand, we can nott genrate random value when
we launch the webserver (the secret_key will be
inconsistent across the workers).

We can generate a random one in the configuration file
airflow.cfg, just like how we deal with FERNET_KEY.

The SECRET_KEY is generated using os.urandom, as
recommended by Flask community.
aliceabe pushed a commit to aliceabe/incubator-airflow that referenced this pull request Jan 3, 2019
The Flask SECRET_KEY should be as random as possible.

On the other hand, we can nott genrate random value when
we launch the webserver (the secret_key will be
inconsistent across the workers).

We can generate a random one in the configuration file
airflow.cfg, just like how we deal with FERNET_KEY.

The SECRET_KEY is generated using os.urandom, as
recommended by Flask community.
@kaxil kaxil added this to the Airflow 1.10.14 milestone Dec 1, 2020
ashb pushed a commit that referenced this pull request Dec 1, 2020
)

The Flask SECRET_KEY should be as random as possible.

On the other hand, we can nott genrate random value when
we launch the webserver (the secret_key will be
inconsistent across the workers).

We can generate a random one in the configuration file
airflow.cfg, just like how we deal with FERNET_KEY.

The SECRET_KEY is generated using os.urandom, as
recommended by Flask community.

(cherry picked from commit f7602f8)
kaxil pushed a commit that referenced this pull request Dec 3, 2020
)

The Flask SECRET_KEY should be as random as possible.

On the other hand, we can nott genrate random value when
we launch the webserver (the secret_key will be
inconsistent across the workers).

We can generate a random one in the configuration file
airflow.cfg, just like how we deal with FERNET_KEY.

The SECRET_KEY is generated using os.urandom, as
recommended by Flask community.

(cherry picked from commit f7602f8)
ashb pushed a commit that referenced this pull request Dec 3, 2020
)

The Flask SECRET_KEY should be as random as possible.

On the other hand, we can nott genrate random value when
we launch the webserver (the secret_key will be
inconsistent across the workers).

We can generate a random one in the configuration file
airflow.cfg, just like how we deal with FERNET_KEY.

The SECRET_KEY is generated using os.urandom, as
recommended by Flask community.

(cherry picked from commit f7602f8)
kaxil pushed a commit to astronomer/airflow that referenced this pull request Dec 4, 2020
…ache#3738)

The Flask SECRET_KEY should be as random as possible.

On the other hand, we can nott genrate random value when
we launch the webserver (the secret_key will be
inconsistent across the workers).

We can generate a random one in the configuration file
airflow.cfg, just like how we deal with FERNET_KEY.

The SECRET_KEY is generated using os.urandom, as
recommended by Flask community.

(cherry picked from commit f7602f8)
(cherry picked from commit 6b06584)
(cherry picked from commit 18e9816b2961152972a41edde38ef648039a915f)
kaxil pushed a commit to astronomer/airflow that referenced this pull request Dec 4, 2020
…ache#3738)

The Flask SECRET_KEY should be as random as possible.

On the other hand, we can nott genrate random value when
we launch the webserver (the secret_key will be
inconsistent across the workers).

We can generate a random one in the configuration file
airflow.cfg, just like how we deal with FERNET_KEY.

The SECRET_KEY is generated using os.urandom, as
recommended by Flask community.

(cherry picked from commit f7602f8)
(cherry picked from commit 6b06584)
(cherry picked from commit 18e9816b2961152972a41edde38ef648039a915f)
(cherry picked from commit cc91e3a)
kaxil pushed a commit to astronomer/airflow that referenced this pull request Dec 4, 2020
…ache#3738)

The Flask SECRET_KEY should be as random as possible.

On the other hand, we can nott genrate random value when
we launch the webserver (the secret_key will be
inconsistent across the workers).

We can generate a random one in the configuration file
airflow.cfg, just like how we deal with FERNET_KEY.

The SECRET_KEY is generated using os.urandom, as
recommended by Flask community.

(cherry picked from commit f7602f8)
(cherry picked from commit 6b06584)
AntonyRileyAtVerto pushed a commit to vertoanalytics/incubator-airflow that referenced this pull request Feb 2, 2021
- BugFix: Tasks with ``depends_on_past`` or ``task_concurrency`` are stuck (apache#12663)
- Fix issue with empty Resources in executor_config (apache#12633)
- Fix: Deprecated config ``force_log_out_after`` was not used (apache#12661)
- Fix empty asctime field in JSON formatted logs (apache#10515)
- [AIRFLOW-2809] Fix security issue regarding Flask SECRET_KEY (apache#3651)
- [AIRFLOW-2884] Fix Flask SECRET_KEY security issue in www_rbac (apache#3729)
- [AIRFLOW-2886] Generate random Flask SECRET_KEY in default config (apache#3738)
- Add missing comma in setup.py (apache#12790)
- Bugfix: Unable to import Airflow plugins on Python 3.8 (apache#12859)
- Fix setup.py missing comma in ``setup_requires`` (apache#12880)
- Don't emit first_task_scheduling_delay metric for only-once dags (apache#12835)

- Update setup.py to get non-conflicting set of dependencies (apache#12636)
- Rename ``[scheduler] max_threads`` to ``[scheduler] parsing_processes`` (apache#12605)
- Add metric for scheduling delay between first run task & expected start time (apache#9544)
- Add new-style 2.0 command names for Airflow 1.10.x (apache#12725)
- Add Kubernetes cleanup-pods CLI command for Helm Chart (apache#11802)
- Don't let webserver run with dangerous config (apache#12747)
- Replace pkg_resources with importlib.metadata to avoid VersionConflict errors (apache#12694)

- Clarified information about supported Databases
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Mar 5, 2021
The Flask SECRET_KEY should be as random as possible.

On the other hand, we can nott genrate random value when
we launch the webserver (the secret_key will be
inconsistent across the workers).

We can generate a random one in the configuration file
airflow.cfg, just like how we deal with FERNET_KEY.

The SECRET_KEY is generated using os.urandom, as
recommended by Flask community.

(cherry picked from commit f7602f8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants