Skip to content

Conversation

@amichai07
Copy link
Contributor

@amichai07 amichai07 commented Feb 22, 2020

This change fixes the bug identified by @houqp (described in #7402), but without losing the performance improvement which was introduced in refactor #4751.


Issue link: AIRFLOW-3607

  • Description above provides context of the change
  • Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*
  • Unit tests coverage for changes (not needed for documentation changes)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@codecov-io
Copy link

Codecov Report

Merging #7503 into master will decrease coverage by 0.28%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7503      +/-   ##
==========================================
- Coverage   86.82%   86.54%   -0.29%     
==========================================
  Files         891      891              
  Lines       42095    42091       -4     
==========================================
- Hits        36549    36426     -123     
- Misses       5546     5665     +119
Impacted Files Coverage Δ
airflow/models/dagrun.py 96.5% <100%> (-0.06%) ⬇️
airflow/kubernetes/volume_mount.py 44.44% <0%> (-55.56%) ⬇️
airflow/kubernetes/volume.py 52.94% <0%> (-47.06%) ⬇️
airflow/kubernetes/pod_launcher.py 47.18% <0%> (-45.08%) ⬇️
...viders/cncf/kubernetes/operators/kubernetes_pod.py 69.38% <0%> (-25.52%) ⬇️
airflow/kubernetes/refresh_config.py 50.98% <0%> (-23.53%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 61a8bb6...342515e. Read the comment docs.

@amichai07 amichai07 requested a review from ashb February 22, 2020 16:01
@ashb
Copy link
Member

ashb commented Feb 24, 2020

WDYT @houqp ?

@mik-laj
Copy link
Member

mik-laj commented Mar 2, 2020

@houqp What do you think?

@houqp
Copy link
Member

houqp commented Apr 1, 2020

sorry, my github mention is overwhelmed, so I missed them. taking a look at this now.

@houqp
Copy link
Member

houqp commented Apr 1, 2020

This change looks correct to me. In my original fix, I also added unit tests to prevent regressions and looks like those tests are still passing, so we should be good 👍

_get_ready_tis also has a stricter depContext, so it should yield a more correct scheduling behavior.

That said, @amichai07 are you sure your patch actually results in performance improvement? Looking at the code, _get_ready_tis actually does more work than the else branch being replaced. I don't think this is a blocker, but just curious if I missed anything in the code. If it's a refactor without performance gain, we should change the commit message accordingly.

@mik-laj
Copy link
Member

mik-laj commented Apr 1, 2020

@houqp In Q2 I want to add more tests that allow you to count the number of queries.
We have tests fo DagFilepProcessor: https://github.com/apache/airflow/blob/6bd2e59/tests/jobs/test_scheduler_job.py#L1130-L1233

@amichai07
Copy link
Contributor Author

@houqp it results in a performance improvement because it passes 'finished_tasks' as an argument in dep_context so no more db queries are made for getting those finished_tasks afterwards.

@houqp
Copy link
Member

houqp commented Apr 2, 2020

@amichai07 ha, i see, thanks for the explanation.

The performance regression test setup by @mik-laj looks great. You might want to add for your optimization to guard against regressions from future changes :)

@amichai07
Copy link
Contributor Author

@ashb WDYT?

@stale
Copy link

stale bot commented May 31, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label May 31, 2020
@houqp
Copy link
Member

houqp commented May 31, 2020

@ashb @mik-laj gentle ping for a final review :)

@stale stale bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label May 31, 2020
@ashb ashb changed the title [AIRFLOW-3607] fixed the bug with picking just special cases while maintaining the p… [AIRFLOW-3607] Optimize dep checking when depends on past set and concurrency limit Jun 5, 2020
@ashb
Copy link
Member

ashb commented Jun 5, 2020

@mik-laj Is it worth adding one of your query-count style tests here do you think?

@mik-laj
Copy link
Member

mik-laj commented Jun 5, 2020

@ashb Yes. My team wants to do it. We have already added the basic test case, and as another PR we want to add more advanced test cases.
Here is PR: #9088
I don't think it needs to be added in this change.

@mik-laj
Copy link
Member

mik-laj commented Jun 5, 2020

We want to add the following test case:

  • One DAG Run with many task instances in one pool.
  • One DAG Run with many tasks instances with many pools
  • Many DAGs Run with a limited number of tasks - below DAG concurrency limit
  • Many DAGs Run with a limited number of tasks - above DAG concurrency limit

@ashb ashb merged commit 32ef0cd into apache:master Jun 5, 2020
turbaszek pushed a commit to PolideaInternal/airflow that referenced this pull request Oct 1, 2020
[AIRFLOW-3607] Only query DB once per DAG run for TriggerRuleDep (apache#4751)

This decreases scheduler delay between tasks by about 20% for larger DAGs,
sometimes more for larger or more complex DAGs.

The delay between tasks can be a major issue, especially when we have dags with
many subdags, figures out that the scheduling process spends plenty of time in
dependency checking, we took the trigger rule dependency which calls the db for
each task instance, we made it call the db just once for each dag_run

[AIRFLOW-3607] fix scheduler bug related to concurrency and depends on past (apache#7402)

commit 50efda5 introduced a bug that
prevents scheduler from scheduling tasks with the following properties:

* has depends on past set to True
* has custom concurrency limit

[AIRFLOW-3607] Optimize dep checking when depends on past set and concurrency limit (apache#7503)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants