Skip to content

Conversation

@normster
Copy link

@normster normster commented Aug 9, 2016

Dear Airflow Maintainers,

Please accept this PR that addresses the following issues:

Testing Done:

  • Expanded the jobs.test_scheduler_verify_max_active_runs test to test if scheduler respects max_active_dag_runs

Fix a bug in the scheduler where dag runs cleared via CLI would be picked up without checking max_active_dag_runs first, resulting in too many simultaneous dag runs.

Fix a bug in the scheduler where dag runs cleared via CLI would be picked up without checking max_active_dag_runs first, resulting in too many simultaneous dag runs.
@r39132
Copy link
Contributor

r39132 commented Aug 10, 2016

https://travis-ci.org/normster/incubator-airflow/builds/151091603 passed..
It doesn't seem that travis is running anymore on apache/incubator-airflow... waited 1 hour at the front of the queue with nothing yet kicked off.
+1

@ldct
Copy link
Contributor

ldct commented Aug 18, 2016

@r39132 @normster I believe the logic here is incorrect, as it causes all scheduling to stop (eg starting task instances for running dagruns, changing dagrun state) https://issues.apache.org/jira/browse/AIRFLOW-434

@r39132
Copy link
Contributor

r39132 commented Aug 18, 2016

@zodiac hmn... interesting. The motivation for the original patch was to have airflow backfill abide by the max_active_runs parameter, but I see how this can result in the case you have pointed out. I will say that regardless of our patch, this limitation already happens to us in some of our dags. It sounds like our "fix" is making the problem more pronounced. Do you changes in this PR aim to work around our earlier fix? If so, I can shoot for a code review later today. I'm heading to the dentist right now.

@ldct
Copy link
Contributor

ldct commented Aug 18, 2016

@r39132

I will say that regardless of our patch, this limitation already happens to us in some of our dags. It sounds like our "fix" is making the problem more pronounced.

I'm not sure I understand, what is "this limitation"? I tested the test cases included in JIRA on the commit right before this PR, and the test case behaved as expected. I am not sure what is the best way to achieve what this PR does while avoiding the bug. I don't have a fix for it.

I'm curious how you avoided running into the bug in your production environment because in my testing, once max_active_dag_runs is hit the state of all dagruns will always stop changing, so nothing gets scheduled.

@r39132
Copy link
Contributor

r39132 commented Aug 19, 2016

Continuing this thread in https://issues.apache.org/jira/browse/AIRFLOW-434

@bolkedebruin
Copy link
Contributor

bolkedebruin commented Sep 8, 2016

@plypaul @zodiac @r39132 @normster @mistercrunch I would like to revert this change. It has caused a serious regression as tasks are not being scheduled anymore in master. This has been encountered in multiple reports and several PRs try to address this, however none of them have come to a conclusive end.

I do understand the need to fix the bug mentioned in this PR however I consider not scheduling tasks more important than adhering to max_active_runs. Before accepting a fix to this it should be accompanied by test evidence. At a minimum a unit test should be added that verifies tasks from running dag_runs are scheduled even if max_dag_runs has been reached.

A solution might be as simple as moving _process_task_instances to above all the tests or reorder them so that the check for max_dag_runs occurs after the processing of the tasks. This however should be part of this PR and properly tested.

@r39132
Copy link
Contributor

r39132 commented Sep 8, 2016

@bolkedebruin I'm okay with reverting this change.

@ldct
Copy link
Contributor

ldct commented Sep 9, 2016

👍

Lets reopen the jira this closes too

On Thursday, 8 September 2016, Sid Anand [email protected] wrote:

@bolkedebruin https://github.com/bolkedebruin I'm okay with reverting
this change.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1716 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAH9OofUBoxe5sBDWyCYzQkkjA7mFR1eks5qoHyegaJpZM4Jgm5R
.

Im Xuan Ji!

alekstorm pushed a commit to alekstorm/incubator-airflow that referenced this pull request Jun 1, 2017
Fix a bug in the scheduler where dag runs cleared via CLI would be picked up without checking max_active_dag_runs first, resulting in too many simultaneous dag runs.

Dear Airflow Maintainers,

Please accept this PR that addresses the following issues:
- https://issues.apache.org/jira/browse/AIRFLOW-78

Testing Done:
- Expanded the jobs.test_scheduler_verify_max_active_runs test to test if scheduler respects max_active_dag_runs

Fix a bug in the scheduler where dag runs cleared via CLI would be picked up without checking max_active_dag_runs first, resulting in too many simultaneous dag runs.

Closes apache#1716 from normster/clear_dagrun
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants