[AIRFLOW-4835] Refactor render_template #5461

BasPH · 2019-06-23T17:24:42Z

Make sure you have checked all steps below.

Jira

My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
- https://issues.apache.org/jira/browse/AIRFLOW-4835
- In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
- In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
- In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

Here are some details about my PR, including screenshots of any UI changes:

I got started on making airflow/models Pylint compatible, and bumped into the render_template function(s) which were real hard to understand at first sight. Because of the large change required to improve it, I decided to split it off in a separate PR.

This PR:

Refactors BaseOperator.render_template() and removes render_template_from_field(). The functionality could be greatly simplified into a single render_template() function.
Removes six usage.
Improves performance by removing two hasattr calls and avoiding recreating Jinja environments.
Removes the argument attr to render_template() which wasn't used.
Squashes multiple similar tests into two parameterized tests.
Adheres to 110 line length.
Adds support for templating sets.
Adds Pydoc.
Adds typing.

Tests

My PR adds the following unit tests OR does not need testing for this extremely good reason:

No unit tests added, but multiple similar tests merged into a single, parameterized, test.

Commits

My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters (not including Jira issue reference)
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not "adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"

Documentation

In case of new functionality, my PR adds documentation that describes how to use it.
- All the public functions and the classes in the PR contain docstrings that explain what it does
- If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

Passes flake8

codecov-io · 2019-06-23T17:57:08Z

Codecov Report

Merging #5461 into master will decrease coverage by <.01%.
The diff coverage is 90.9%.

@@            Coverage Diff             @@
##           master    #5461      +/-   ##
==========================================
- Coverage   79.13%   79.13%   -0.01%     
==========================================
  Files         488      488              
  Lines       30608    30601       -7     
==========================================
- Hits        24222    24215       -7     
  Misses       6386     6386

Impacted Files	Coverage Δ
airflow/models/dag.py	`91.8% <100%> (ø)`	⬆️
airflow/models/taskinstance.py	`92.98% <85.71%> (-0.04%)`	⬇️
airflow/models/baseoperator.py	`94.66% <92.85%> (-0.06%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 06dba66...8f61e07. Read the comment docs.

codecov-io · 2019-06-23T17:57:08Z

Codecov Report

Merging #5461 into master will decrease coverage by <.01%.
The diff coverage is 88.88%.

@@            Coverage Diff             @@
##           master    #5461      +/-   ##
==========================================
- Coverage   79.04%   79.04%   -0.01%     
==========================================
  Files         489      489              
  Lines       30685    30678       -7     
==========================================
- Hits        24256    24249       -7     
  Misses       6429     6429

Impacted Files	Coverage Δ
airflow/models/dag.py	`91.83% <100%> (ø)`	⬆️
airflow/models/taskinstance.py	`92.98% <85.71%> (-0.04%)`	⬇️
airflow/models/baseoperator.py	`94.64% <90%> (-0.06%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 38bbaeb...8812606. Read the comment docs.

turbaszek

Maybe you can remove refactored models from pylint TODO list?

BasPH · 2019-07-15T19:58:13Z

Maybe you can remove refactored models from pylint TODO list?

@nuclearpinguin Maybe I don't understand correctly, but this only refactors the template rendering function, not the entire models.taskinstance script.

turbaszek · 2019-07-15T21:19:43Z

I just wanted to suggest that maybe after your refactor the files could be remove from the todo list. If there's only few errors left then maybe it's worth to refactor them too? 🙂

potiuk · 2019-07-16T05:07:35Z

Side comment @nuclearpinguin @BasPH -> I think we should adapt our approach for pylint changes a bit after first experiences with it. This (and other) cases shows that we rarely naturally fix all the pylint problems per-file.

While it worked fine for GCP operators/tests, I think the approach proposed by @mik-laj works a bit nicer: #5587 (comment), however maybe we can modify it slightly.

I like the idea that you won't fix it file-by-file but warning-by-warning. It sounds much more natural. But I do not like the idea that this means that files will not disappear from the todo list as quickly as we would like them to, and will not prevent people from adding new pylint errors in the meantime to files that are already pretty much cleared.

@mik-laj already has a script that produces updated pylint_todo.txt so we can regularly re-run and update it (it does not have to be always when we fix some errors - it can be independent and regular).

I think what might make sense is to combine this approach - maybe we could extend the script that will run pylint on the whole codebase and produce updated pylint_todo.txt list + list of candidates that can be cleaned really quickly (basically with a few pylint problems left). This way our natural process of fixing pylint problems would look as follows:

Fix particular warning across the board (rinse-repeat for different warnings types) - this can be done independently by different people
Regularly (weekly ??) regenerate pylint-todo.txt + fix those files with just a few errors left
Repeat.

What do you think @BasPH @mik-laj @nuclearpinguin ?

@mik-laj - I know you have some tools/scripts that you use to automate some of that work (like creating JIRA issues etc). Maybe you could create a PR with those tools that others (us?) could use as well and we could setup a small 'process' for that. We are down to 4000 errors or so from 5000 but I feel that if we establish this kind of small process, we can invite others to help with the effort and get it done much more quickly. Especially we could encourage new contributors of Airflow to help with that if we give them easy to use process/tools to do it.

mik-laj · 2019-07-16T06:16:22Z

First part of discussion: #5587

BasPH · 2019-07-16T06:30:08Z

Commented in #5587.

Any remarks on this PR?

airflow/models/baseoperator.py

BasPH · 2019-07-30T11:20:36Z

I've rebased on master after #5673, now it's up to date. Once again, any comments or can we merge this? @potiuk?

ashb

A few small changes, but much much nicer overall!

airflow/models/baseoperator.py

ashb · 2019-07-31T08:46:03Z

airflow/models/taskinstance.py

While we are refactoring this: How about we move all this logic in to a function in BaseOperator -- it all seems to operate on self.task.X which is probably a good indicator that's where this belongs. (We should keep this for back-compat though?)

Good point, will move around some code tonight.

tests/models/test_dag.py

BasPH · 2019-07-31T18:39:46Z

@ashb I've addressed all your comments:

Added a method render_template_fields to the BaseOperator, which is called from TaskInstance.render_templates (which doesn't perform any logic anymore and simply calls BaseOperator.render_template_fields. I think eventually the whole TaskInstance.render_templates method could be removed, but it's being called in many tests and thus would introduce a lot more code changes, so would do that another time.
Updated the description of BaseOperator.render_template, hopefully it makes sense now.
Merged the uuid and objects tests into a single parameterized test.
Merged the namedtuple test into the big parameterized test, by defining the namedtuple class at the top of the script.

ashb

Like the tests now, much easier to follow

A few small changes I just noticed this time around. One bug too I think

ashb · 2019-08-01T10:13:04Z

airflow/models/baseoperator.py

This is going to create a new jinja env for each template field - we probably want to reuse a jinja env for all fields don't we?

Wow that's pretty serious, I missed this one when moving code to the BaseOperator. Thanks for pointing it out.

ashb · 2019-08-01T10:15:09Z

airflow/models/baseoperator.py

Suggested change

return self.dag.get_template_env() if hasattr(self, 'dag') else jinja2.Environment(cache_size=0)

return self.dag.get_template_env() if self.has_dag() else jinja2.Environment(cache_size=0)

Especially given the attribute is _dag :)

ashb · 2019-08-01T10:16:26Z

airflow/models/baseoperator.py

We shouldn't need to do this -- as dag.get_template_env() will handle this for us (once we fix the bug there).

Removed 👍

ashb · 2019-08-01T10:17:26Z

airflow/models/baseoperator.py

I also think we can drop this if and just iterate over the self.template_fields

True, the BaseOperator holds an empty list so should be fine.

ashb · 2019-08-01T10:19:15Z

tests/contrib/operators/test_databricks_operator.py

Is it worth changing these sort of uses in the tests to op.render_template_fields({'ds': DATE}) now?

Yes, no point in calling render_tempate anymore directly because it's now more of an internal method being called by render_template_fields.

ashb · 2019-08-01T10:20:47Z

tests/models/test_dag.py

Suggested change

({"user_defined_filters": {"hello": lambda name: 'Hello %s' % name}}, "{{ 'world' | hello }}", {}, "Hello world"), # noqa: E501, pylint: disable=line-too-long

({"user_defined_filters": {"hello": lambda name: 'Hello %s' % name}},

"{{ 'world' | hello }}",

{},

"Hello world"),

perhaps?

Since we moved the templating logic to the BaseOperator, I also moved the tests for the respective methods with it. There was no test_baseoperator.py yet so all code in there is now nicely formatted with Black.

ashb · 2019-08-01T10:21:10Z

tests/models/test_dag.py

Look at all these deletions 😍

BasPH · 2019-08-03T08:36:57Z

@ashb addressed your comments. I've created tests/models/test_baseoperator.py (this didn't exist to my surprise) because some logic from the TaskInstance moved to the BaseOperator. In addition, I added a test to verify if a Jinja environment is instantiated only once, even with multiple templated fields (test_jinja_env_creation in tests/models/test_baseoperator.py).

I think a few bits and pieces could still improve (e.g. render_template could be a staticmethod, because it doesn't read anything directly from self, sets are currently not templatable), etc... but think we should also keep the PR as small as possible. With that idea, if all is okay logically, is this okay now? 🙂

ashb · 2019-08-03T15:49:13Z

Yeah I think most of the tests that should be in test_baseoperator are currently in test_taskinstance or core. We can move them later

ashb · 2019-08-03T16:02:59Z

I can't easily check on my phone but it looks good. Will check on a bigger screen tonight

airflow/models/baseoperator.py

BasPH · 2019-08-05T16:37:44Z

I resolved @bjoernpollex-sc's comment and while I was at it, added support for templating sets, since it was such a small change.

BasPH · 2019-08-10T11:30:32Z

@ashb any other remarks on this PR, or is it good to go?

ashb · 2019-08-10T19:52:52Z

@bazph I took a week off, will look again in the morning

- Refactors `BaseOperator.render_template()` and removes `render_template_from_field()`. The functionality could be greatly simplified into a single `render_template()` function. - Improves performance by removing two `hasattr` calls and avoiding recreating Jinja environments. - Removes the argument `attr` to `render_template()` which wasn't used. - Squashes multiple similar tests into two parameterized tests. - Adheres to 110 line length. - Adds support for templating sets. - Adds Pydoc. - Adds typing. (cherry picked from commit 47dd4c9)

turbaszek reviewed Jul 10, 2019

View reviewed changes

simpleranchero reviewed Jul 17, 2019

View reviewed changes

airflow/models/baseoperator.py Outdated Show resolved Hide resolved

ashb requested changes Jul 31, 2019

View reviewed changes

ashb requested changes Aug 1, 2019

View reviewed changes

bjoernpollex-sc reviewed Aug 5, 2019

View reviewed changes

airflow/models/baseoperator.py Outdated Show resolved Hide resolved

galak75 mentioned this pull request Aug 5, 2019

[AIRFLOW-3871] render Operators template fields recursively #4743

Merged

6 tasks

[AIRFLOW-4835] Refactor operator render_template

e0af1d4

ashb approved these changes Aug 12, 2019

View reviewed changes

ashb merged commit 47dd4c9 into apache:master Aug 12, 2019

BasPH deleted the bash-improve-templating-logic branch August 12, 2019 09:08

smith-m mentioned this pull request Jan 8, 2021

Nested templated variables do not always render #13559

Closed

	return self.dag.get_template_env() if hasattr(self, 'dag') else jinja2.Environment(cache_size=0)
	return self.dag.get_template_env() if self.has_dag() else jinja2.Environment(cache_size=0)

-            ({"user_defined_filters": {"hello": lambda name: 'Hello %s' % name}}, "{{ 'world' | hello }}", {}, "Hello world"),  # noqa: E501, pylint: disable=line-too-long
+            ({"user_defined_filters": {"hello": lambda name: 'Hello %s' % name}},
+             "{{ 'world' | hello }}",
+             {},
+             "Hello world"),

[AIRFLOW-4835] Refactor render_template #5461

[AIRFLOW-4835] Refactor render_template #5461

Uh oh!

Conversation

BasPH commented Jun 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Jira

Description

Tests

Commits

Documentation

Code Quality

Uh oh!

codecov-io commented Jun 23, 2019

Codecov Report

Uh oh!

codecov-io commented Jun 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

turbaszek left a comment

Choose a reason for hiding this comment

Uh oh!

BasPH commented Jul 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

turbaszek commented Jul 15, 2019

Uh oh!

potiuk commented Jul 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mik-laj commented Jul 16, 2019

Uh oh!

BasPH commented Jul 16, 2019

Uh oh!

Uh oh!

BasPH commented Jul 30, 2019

Uh oh!

ashb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BasPH commented Jul 31, 2019

Uh oh!

ashb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ashb Aug 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BasPH commented Jun 23, 2019 •

edited

Loading

codecov-io commented Jun 23, 2019 •

edited

Loading

BasPH commented Jul 15, 2019 •

edited

Loading

potiuk commented Jul 16, 2019 •

edited

Loading

ashb Aug 1, 2019 •

edited

Loading