-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow version: 1.10.10
Environment:
- Cloud provider or hardware configuration: VMs
- OS: Ubuntu 16.04
- Kernel: Linux {host} 4.15.0-88-generic Airflow Tutorial page inconsistencies #88~16.04.1-Ubuntu SMP Wed Feb 12 04:19:15 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
- Install tools: pip
- Others:
- OS is configured in UTC
- UI is configured to show UTC
- default_timezone in airflow config set to UTC
- DB is MySQL and configured in PT
What happened:
I noticed that execution_dates and other datetimes were off in the ModelView parts of the RBAC UI (e.g. /dagrun/list/). Even though all DAGs were set to UTC schedules, execution_dates showed 5pm (which is what midnight UTC looks when shown in PT).
It seems that the MySQL timezone leaked and caused the UI to show wrong datetimes. Though this should not happen as datetimes are encoded as UTC in the databases and other parts of the UI were not showing this error.
This also impacts links to log files in the task instance list view, which end up no where since the datetime in the URL is now wrong too.
What you expected to happen:
I would expect the datetimes in ModelView pages to also be aligned with UTC.
How to reproduce it:
- Install Airflow on a OS that is set to UTC
- Set it up with a MySQL database that is configured in some other timezone
- Run some example DAGs to get some dag runs in the database
- Go to the
/dagrun/listor/taskinstance/list, notice the wrong datetimes
Anything else we need to know:
Other things I looked at
- Other parts of the UI, such as the home page and tooltips on the tree view showed the expected execution_dates at midnight
- I double checked the datetimes in the database, and they were all the correct timestamps: after setting my session timezone to UTC in MySQL, execution_dates were on midnight
--> Something seemed to be wrong specifically for these ModelViews.
I went down the rabbit hole and I think I figured it out (at least enough to propose a fix).
- The
/last_dagrunsAPI returns correct datetimes, and it queries the database using a session that is provided by airflow with the@provide_sessiondecorator. This was my beacon to understand what went wrong in the other views.
airflow/airflow/www_rbac/views.py
Lines 474 to 483 in 9669718
| @expose('/last_dagruns', methods=['POST']) | |
| @has_access | |
| @provide_session | |
| def last_dagruns(self, session=None): | |
| DagRun = models.DagRun | |
| allowed_dag_ids = appbuilder.sm.get_accessible_dag_ids() | |
| if 'all_dags' in allowed_dag_ids: | |
| allowed_dag_ids = [dag_id for dag_id, in session.query(models.DagModel.dag_id)] |
- The
/dagrun/listview returns bad datetimes, and it queries the database with some very different path. This is the view
airflow/airflow/www_rbac/views.py
Lines 2502 to 2505 in 9669718
| class DagRunModelView(AirflowModelView): | |
| route_base = '/dagrun' | |
| datamodel = AirflowModelView.CustomSQLAInterface(models.DagRun) |
It extends AirflowModelView which extends ModelView where we finally find the /list/ route
That view ends up querying its self.datamodel
using self.datamodel.query() where self.datamodel is the datamodel = AirflowModelView.CustomSQLAInterface(models.DagRun) that was set in our DagRunModelView.
This datamodel object is a SQLAInterface and this is the query function that gets called:
Note that it expects the datamodel / SQLAInterface to have a self.session, which we did not provide! Somehow, it was still provided because it is able to query DAG runs, but I inspected the session object and it is very different from the sessions we create in Airflow, i.e. very different from the one used in 1. for /last_dagruns that gave us correct datetimes.
If we do provide our own session to the datamodel object, then the problem goes away, datetimes are back to correct. The constructor even expects a session but for some reason we forgot to provide it.
So my proposed fix is here
class CustomSQLAInterface(SQLAInterface):
"""
FAB does not know how to handle columns with leading underscores because
they are not supported by WTForm. This hack will remove the leading
'_' from the key to lookup the column names.
"""
@provide_session # <------ did someone forget this decorator?
def __init__(self, obj, session=None):
super(CustomSQLAInterface, self).__init__(obj, session=session)I have tested this locally and it works.
If someone can give this a thumbs up, I'd be happy to propose a PR.