[airflow] Implement task-branch-as-short-circuit (AIR004)#23579
[airflow] Implement task-branch-as-short-circuit (AIR004)#23579ntBre merged 1 commit intoastral-sh:mainfrom
airflow] Implement task-branch-as-short-circuit (AIR004)#23579Conversation
|
@Lee-W would you mind taking a look at this? |
|
Will be out these days. Will take a look next week. Thanks! |
sjyangkevin
left a comment
There was a problem hiding this comment.
Thanks for implementing it, some small feedback
There was a problem hiding this comment.
Thank you for implementing it! Just want to share some feedback about the rule and the code AIR003. Do you think it is a good idea to have a more generic name to group similar rules? I feel the rule is more like “best practice” we can suggest change. So, later on we could extend on this to add more rules, instead of making this rule specific to the branch -> short circuit
There was a problem hiding this comment.
Do you mean reserving a range of codes (e.g. AIR1XX for best practices), or have one giant rule that has various best practices in it? If the latter, I would be careful with increasing the scope of a rule, to provide users the ability to selectively turn them on/off.
There was a problem hiding this comment.
I can see currently AIR001 and AIR002 are also for best practices. I am just trying to think how many such rules we might introduce. I think we have a trade-off here:
- use one code to implement a rule cover a best practice, users have flexibility to selectively apply the rule.
- use one code to group multiple rules, and introduce (suggest fixes) all best practices at once
The second might be a better approach from my perspective, as if we would like to introduce a better way to write the code, or best practices, then probably it is not harm to introduce them all 🤔 , but I am open for different view.
There was a problem hiding this comment.
Do you have a specific example in mind of a rule that does multiple different things?
We could have a "simplify branching" rule with multiple possible fixes.
I can also imagine users wanting to transition to taskflow would like a rule that flags operators with an equivalent taskflow decorator - this would be multiple checks under the same code.
Is it a problem to rename rules later if we want to expand the functionality? In my personal projects that would be a standard refactor for when code evolves.
There was a problem hiding this comment.
What I saw in other rule families is a format like AIR01X - for rules dealing with topic A, AIR02X for rules on topic B, etc., but individual rules deal with a very specific pattern.
There was a problem hiding this comment.
I think we should differentiate different practices so that users can configurate it
|
ce2f212 to
0dec43a
Compare
|
Hi @Dev-iL, probably similar to AIR005 in some ways (probably implementation detail). As we are validating the |
Sounds good. Are we going to suggest to replace it with the taskflow version? |
Personally, I think it is something that the community should decide if TaskFlow is the recommended way of writing Dags in Airflow (if it hasn't decided already). |
40466f0 to
bee9aef
Compare
I think there should be a separate rule to assist with migration to taskflow. Users may choose to disable/ignore it when unwanted. |
It could be a separate rule for TaskFlow API to make it easier for some teams to migrate from one way to another, but it will not resolve the issue which I've mentioned. The rules themselves should be Dag implementation-agnostic in the sense that it doesn't matter if Dag was written in a standard way or with TaskFlow API. The different approach would cover only a subset of cases, while both ways are equally supported, and there is no recommended way of Dags writing in that sense (at least I didn't find anything in the documentation regarding it). Looking at the current implementation of
P.S. I found two notes in the Airflow documentation regarding the recommendation related to the TaskFlow API. One is connected to the |
bee9aef to
63ee3d1
Compare
Done in this commit + SKILL updated to be aware of this pattern. |
|
Hey, thanks for creating this. I'm actaully good with adding this, but it would be better if we had a stronger consensus across the community. I'll start a discussion in the dev list today. |
63ee3d1 to
9d964b6
Compare
|
Like #23583, I'll convert this to a draft for now. Thanks very much for the reviews (and the PR!)! |
9d964b6 to
05cbc24
Compare
|
The Airflow community approved this rule, so I'm marking it "Ready for review". Refs: |
05cbc24 to
7076aa3
Compare
Looks good to me |
There was a problem hiding this comment.
I think we should differentiate different practices so that users can configurate it
|
Sorry for the delay on review here! Is this ready for another look from me? I wasn't 100% sure that this thread was resolved, but I think it probably is. |
Yeah, I'd say there's nothing left to discuss there. It was a "meta" discussion anyway. Please take another look! |
7076aa3 to
93bd5e4
Compare
|
yep, I think that one is not for this PR and I don't think we'll do it that way. but we'll need to update the code in the descirption and title |
93bd5e4 to
e01f6e6
Compare
airflow] Implement task-branch-as-short-circuit (AIR004)
|
@MichaReiser perhaps you can take a look? |
airflow] Implement task-branch-as-short-circuit (AIR004)airflow] Implement task-branch-as-short-circuit (AIR004)
Summary
Adds a new rule
AIR004that detects@task.branchdecorated functions that could be replaced with@task.short_circuit.In Airflow,
@task.branchselects which downstream tasks to run by returning a list of task IDs (or an empty list to skip all). When the function has at least tworeturnstatements and exactly one of them returns a non-empty list, it is effectively acting as a boolean short-circuit (i.e. either run one specific set of downstream tasks or skip them all). In that case,@task.short_circuitis a simpler and more readable alternative that returnsTrue/Falseinstead.Implementation details
@task.branchdecorator via the semantic model (airflow.decorators.task+.branchattribute), handling both@task.branchand@task.branch()call forms viamap_callable.ReturnStatementVisitorto collect allreturnstatements recursively (including those inside nestedif/else/for/whileblocks).len(returns) >= 2and exactly one return has a non-empty list value.What it does NOT flag
@task.branch.None, etc.).Test Plan
Added snapshot tests in
AIR004.pycovering both violation and non-violation cases:@task.short_circuitdecorated functions