Skip to content

Conversation

@BasPH
Copy link
Contributor

@BasPH BasPH commented Jun 23, 2019

Make sure you have checked all steps below.

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:

I got started on making airflow/models Pylint compatible, and bumped into the render_template function(s) which were real hard to understand at first sight. Because of the large change required to improve it, I decided to split it off in a separate PR.

This PR:

  • Refactors BaseOperator.render_template() and removes render_template_from_field(). The functionality could be greatly simplified into a single render_template() function.
  • Removes six usage.
  • Improves performance by removing two hasattr calls and avoiding recreating Jinja environments.
  • Removes the argument attr to render_template() which wasn't used.
  • Squashes multiple similar tests into two parameterized tests.
  • Adheres to 110 line length.
  • Adds support for templating sets.
  • Adds Pydoc.
  • Adds typing.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

No unit tests added, but multiple similar tests merged into a single, parameterized, test.

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

@codecov-io
Copy link

Codecov Report

Merging #5461 into master will decrease coverage by <.01%.
The diff coverage is 90.9%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5461      +/-   ##
==========================================
- Coverage   79.13%   79.13%   -0.01%     
==========================================
  Files         488      488              
  Lines       30608    30601       -7     
==========================================
- Hits        24222    24215       -7     
  Misses       6386     6386
Impacted Files Coverage Δ
airflow/models/dag.py 91.8% <100%> (ø) ⬆️
airflow/models/taskinstance.py 92.98% <85.71%> (-0.04%) ⬇️
airflow/models/baseoperator.py 94.66% <92.85%> (-0.06%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 06dba66...8f61e07. Read the comment docs.

@codecov-io
Copy link

codecov-io commented Jun 23, 2019

Codecov Report

Merging #5461 into master will decrease coverage by <.01%.
The diff coverage is 88.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5461      +/-   ##
==========================================
- Coverage   79.04%   79.04%   -0.01%     
==========================================
  Files         489      489              
  Lines       30685    30678       -7     
==========================================
- Hits        24256    24249       -7     
  Misses       6429     6429
Impacted Files Coverage Δ
airflow/models/dag.py 91.83% <100%> (ø) ⬆️
airflow/models/taskinstance.py 92.98% <85.71%> (-0.04%) ⬇️
airflow/models/baseoperator.py 94.64% <90%> (-0.06%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 38bbaeb...8812606. Read the comment docs.

Copy link
Member

@turbaszek turbaszek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can remove refactored models from pylint TODO list?

@BasPH
Copy link
Contributor Author

BasPH commented Jul 15, 2019

Maybe you can remove refactored models from pylint TODO list?

@nuclearpinguin Maybe I don't understand correctly, but this only refactors the template rendering function, not the entire models.taskinstance script.

@turbaszek
Copy link
Member

I just wanted to suggest that maybe after your refactor the files could be remove from the todo list. If there's only few errors left then maybe it's worth to refactor them too? 🙂

@potiuk
Copy link
Member

potiuk commented Jul 16, 2019

Side comment @nuclearpinguin @BasPH -> I think we should adapt our approach for pylint changes a bit after first experiences with it. This (and other) cases shows that we rarely naturally fix all the pylint problems per-file.

While it worked fine for GCP operators/tests, I think the approach proposed by @mik-laj works a bit nicer: #5587 (comment), however maybe we can modify it slightly.

I like the idea that you won't fix it file-by-file but warning-by-warning. It sounds much more natural. But I do not like the idea that this means that files will not disappear from the todo list as quickly as we would like them to, and will not prevent people from adding new pylint errors in the meantime to files that are already pretty much cleared.

@mik-laj already has a script that produces updated pylint_todo.txt so we can regularly re-run and update it (it does not have to be always when we fix some errors - it can be independent and regular).

I think what might make sense is to combine this approach - maybe we could extend the script that will run pylint on the whole codebase and produce updated pylint_todo.txt list + list of candidates that can be cleaned really quickly (basically with a few pylint problems left). This way our natural process of fixing pylint problems would look as follows:

  1. Fix particular warning across the board (rinse-repeat for different warnings types) - this can be done independently by different people
  2. Regularly (weekly ??) regenerate pylint-todo.txt + fix those files with just a few errors left
  3. Repeat.

What do you think @BasPH @mik-laj @nuclearpinguin ?

@mik-laj - I know you have some tools/scripts that you use to automate some of that work (like creating JIRA issues etc). Maybe you could create a PR with those tools that others (us?) could use as well and we could setup a small 'process' for that. We are down to 4000 errors or so from 5000 but I feel that if we establish this kind of small process, we can invite others to help with the effort and get it done much more quickly. Especially we could encourage new contributors of Airflow to help with that if we give them easy to use process/tools to do it.

@mik-laj
Copy link
Member

mik-laj commented Jul 16, 2019

First part of discussion: #5587

@BasPH
Copy link
Contributor Author

BasPH commented Jul 16, 2019

Commented in #5587.

Any remarks on this PR?

@BasPH
Copy link
Contributor Author

BasPH commented Jul 30, 2019

I've rebased on master after #5673, now it's up to date. Once again, any comments or can we merge this? @potiuk?

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small changes, but much much nicer overall!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are refactoring this: How about we move all this logic in to a function in BaseOperator -- it all seems to operate on self.task.X which is probably a good indicator that's where this belongs. (We should keep this for back-compat though?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will move around some code tonight.

@BasPH
Copy link
Contributor Author

BasPH commented Jul 31, 2019

@ashb I've addressed all your comments:

  • Added a method render_template_fields to the BaseOperator, which is called from TaskInstance.render_templates (which doesn't perform any logic anymore and simply calls BaseOperator.render_template_fields. I think eventually the whole TaskInstance.render_templates method could be removed, but it's being called in many tests and thus would introduce a lot more code changes, so would do that another time.
  • Updated the description of BaseOperator.render_template, hopefully it makes sense now.
  • Merged the uuid and objects tests into a single parameterized test.
  • Merged the namedtuple test into the big parameterized test, by defining the namedtuple class at the top of the script.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the tests now, much easier to follow

A few small changes I just noticed this time around. One bug too I think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to create a new jinja env for each template field - we probably want to reuse a jinja env for all fields don't we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow that's pretty serious, I missed this one when moving code to the BaseOperator. Thanks for pointing it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return self.dag.get_template_env() if hasattr(self, 'dag') else jinja2.Environment(cache_size=0)
return self.dag.get_template_env() if self.has_dag() else jinja2.Environment(cache_size=0)

Especially given the attribute is _dag :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Member

@ashb ashb Aug 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to do this -- as dag.get_template_env() will handle this for us (once we fix the bug there).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think we can drop this if and just iterate over the self.template_fields

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, the BaseOperator holds an empty list so should be fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth changing these sort of uses in the tests to op.render_template_fields({'ds': DATE}) now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, no point in calling render_tempate anymore directly because it's now more of an internal method being called by render_template_fields.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
({"user_defined_filters": {"hello": lambda name: 'Hello %s' % name}}, "{{ 'world' | hello }}", {}, "Hello world"), # noqa: E501, pylint: disable=line-too-long
({"user_defined_filters": {"hello": lambda name: 'Hello %s' % name}},
"{{ 'world' | hello }}",
{},
"Hello world"),

perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we moved the templating logic to the BaseOperator, I also moved the tests for the respective methods with it. There was no test_baseoperator.py yet so all code in there is now nicely formatted with Black.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at all these deletions 😍

@BasPH
Copy link
Contributor Author

BasPH commented Aug 3, 2019

@ashb addressed your comments. I've created tests/models/test_baseoperator.py (this didn't exist to my surprise) because some logic from the TaskInstance moved to the BaseOperator. In addition, I added a test to verify if a Jinja environment is instantiated only once, even with multiple templated fields (test_jinja_env_creation in tests/models/test_baseoperator.py).

I think a few bits and pieces could still improve (e.g. render_template could be a staticmethod, because it doesn't read anything directly from self, sets are currently not templatable), etc... but think we should also keep the PR as small as possible. With that idea, if all is okay logically, is this okay now? 🙂

@ashb
Copy link
Member

ashb commented Aug 3, 2019

Yeah I think most of the tests that should be in test_baseoperator are currently in test_taskinstance or core. We can move them later

@ashb
Copy link
Member

ashb commented Aug 3, 2019

I can't easily check on my phone but it looks good. Will check on a bigger screen tonight

@BasPH
Copy link
Contributor Author

BasPH commented Aug 5, 2019

I resolved @bjoernpollex-sc's comment and while I was at it, added support for templating sets, since it was such a small change.

@BasPH
Copy link
Contributor Author

BasPH commented Aug 10, 2019

@ashb any other remarks on this PR, or is it good to go?

@ashb
Copy link
Member

ashb commented Aug 10, 2019

@bazph I took a week off, will look again in the morning

@ashb ashb merged commit 47dd4c9 into apache:master Aug 12, 2019
@BasPH BasPH deleted the bash-improve-templating-logic branch August 12, 2019 09:08
ashb pushed a commit to ashb/airflow that referenced this pull request Sep 18, 2019
- Refactors `BaseOperator.render_template()` and removes `render_template_from_field()`. The functionality could be greatly simplified into a single `render_template()` function.
- Improves performance by removing two `hasattr` calls and avoiding recreating Jinja environments.
- Removes the argument `attr` to `render_template()` which wasn't used.
- Squashes multiple similar tests into two parameterized tests.
- Adheres to 110 line length.
- Adds support for templating sets.
- Adds Pydoc.
- Adds typing.

(cherry picked from commit 47dd4c9)
ashb pushed a commit that referenced this pull request Sep 24, 2019
- Refactors `BaseOperator.render_template()` and removes `render_template_from_field()`. The functionality could be greatly simplified into a single `render_template()` function.
- Improves performance by removing two `hasattr` calls and avoiding recreating Jinja environments.
- Removes the argument `attr` to `render_template()` which wasn't used.
- Squashes multiple similar tests into two parameterized tests.
- Adheres to 110 line length.
- Adds support for templating sets.
- Adds Pydoc.
- Adds typing.

(cherry picked from commit 47dd4c9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants