Skip to content

[airflow] Implement airflow-xcom-pull-in-template-string (AIR201)#23583

Merged
MichaReiser merged 5 commits intoastral-sh:mainfrom
Dev-iL:2602/airflow/xcom_template
Apr 14, 2026
Merged

[airflow] Implement airflow-xcom-pull-in-template-string (AIR201)#23583
MichaReiser merged 5 commits intoastral-sh:mainfrom
Dev-iL:2602/airflow/xcom_template

Conversation

@Dev-iL
Copy link
Copy Markdown
Contributor

@Dev-iL Dev-iL commented Feb 26, 2026

Summary

Implements rule AIR004 (airflow-xcom-pull-in-template-string) that detects Airflow operator/sensor keyword arguments using a Jinja template string containing a single xcom_pull call (e.g., "{{ ti.xcom_pull(task_ids='some_task') }}") and suggests replacing it with the .output attribute on the task object (e.g., some_task.output).

Using .output instead of xcom_pull template strings:

  • Makes task dependencies explicit and visible to the DAG parser
  • Provides better IDE support (autocompletion, go-to-definition)
  • Is the recommended pattern for both traditional operators and TaskFlow API (@task-decorated functions)

What the rule flags

from airflow.operators.python import PythonOperator
from airflow.operators.bash import BashOperator

task_1 = PythonOperator(task_id="task_1", python_callable=my_func)
task_2 = BashOperator(
    task_id="task_2",
    bash_command="{{ ti.xcom_pull(task_ids='task_1') }}",  # AIR004
)

Suggested fix

task_2 = BashOperator(
    task_id="task_2",
    bash_command=task_1.output,
)

Template patterns detected

  • {{ ti.xcom_pull(task_ids='...') }} and {{ task_instance.xcom_pull(task_ids='...') }}
  • Positional argument: {{ ti.xcom_pull('...') }}
  • Both task_id= and task_ids= keyword forms
  • Various whitespace and quote styles

What it allows (no false positives)

  • Mixed content strings: "echo {{ ti.xcom_pull(task_ids='task_1') }}"
  • Non-default key arguments: "{{ ti.xcom_pull(task_ids='task_1', key='my_key') }}"
  • Already using .output: task_1.output
  • List of task_ids: "{{ ti.xcom_pull(task_ids=['a', 'b']) }}"
  • Non-operator/sensor calls (e.g., DAG(...))

Unsafe fix

When the referenced task_id matches a variable in scope (either an operator assignment or a @task-decorated function), an unsafe fix is provided that replaces the template string with <variable>.output. When no matching variable is found, the diagnostic is still reported but without an auto-fix.

Test Plan

  • Snapshot tests in AIR004.py covering:
    • Violations with fixes: ti.xcom_pull, task_instance.xcom_pull, positional args, double quotes, no-space braces, singular task_id keyword, sensors, provider operators, and @task-decorated function sources.
    • Violations without fixes: referencing a task_id not visible as a variable in the current scope.
    • Non-violations: mixed content strings, extra keyword arguments, already using .output, regular strings, non-string arguments, list task_ids, and non-operator calls.
  • Unit tests for the parse_xcom_pull_template parser covering all template variants and rejection cases.
  • Clippy passes with no warnings.

related: apache/airflow#43176

@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot Bot commented Feb 26, 2026

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+3 -0 violations, +0 -0 fixes in 1 projects; 55 projects unchanged)

apache/airflow (+3 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --no-fix --output-format concise --preview --select ALL

+ providers/amazon/tests/system/amazon/aws/example_dms_serverless.py:329:32: AIR201 Use the `.output` attribute on the task object for "create_replication_config" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/vertex_ai/example_vertex_ai_feature_store.py:174:32: AIR201 Use the `.output` attribute on the task object for "sync_task" instead of `xcom_pull` in a template string
+ providers/google/tests/system/google/cloud/vertex_ai/example_vertex_ai_feature_store.py:185:32: AIR201 Use the `.output` attribute on the task object for "sync_task" instead of `xcom_pull` in a template string

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
AIR201 3 3 0 0 0

@ntBre ntBre added rule Implementing or modifying a lint rule preview Related to preview mode features labels Feb 26, 2026
Dev-iL added a commit to Dev-iL/airflow that referenced this pull request Feb 26, 2026
Dev-iL added a commit to Dev-iL/airflow that referenced this pull request Feb 26, 2026
@Dev-iL Dev-iL force-pushed the 2602/airflow/xcom_template branch from c0c2205 to fb18572 Compare February 26, 2026 17:20
Dev-iL added a commit to Dev-iL/airflow that referenced this pull request Feb 26, 2026
@amyreese
Copy link
Copy Markdown
Contributor

cc @sjyangkevin for review?

@sjyangkevin
Copy link
Copy Markdown
Contributor

cc @sjyangkevin for review?

would be great to also have @Lee-W to have a look when he is available.

Dev-iL added a commit to Dev-iL/airflow that referenced this pull request Feb 27, 2026
@Dev-iL Dev-iL force-pushed the 2602/airflow/xcom_template branch 3 times, most recently from ca33d94 to 25ab2b5 Compare March 3, 2026 20:19
@Lee-W
Copy link
Copy Markdown
Contributor

Lee-W commented Mar 5, 2026

Hey, thanks for creating this. I'm actaully good with adding this, but it would be better if we had a stronger consensus across the community. I'll start a discussion in the dev list today.

@Dev-iL Dev-iL force-pushed the 2602/airflow/xcom_template branch from 25ab2b5 to 12d2489 Compare March 5, 2026 08:09
@ntBre
Copy link
Copy Markdown
Contributor

ntBre commented Mar 5, 2026

Thanks @Lee-W and @sjyangkevin for taking a look! I think I'll convert this back to a draft for now until there's consensus on the Airflow side.

@ntBre ntBre marked this pull request as draft March 5, 2026 16:53
@Dev-iL Dev-iL force-pushed the 2602/airflow/xcom_template branch from 12d2489 to e51918c Compare March 20, 2026 11:15
@Dev-iL
Copy link
Copy Markdown
Contributor Author

Dev-iL commented Mar 23, 2026

@Dev-iL Dev-iL marked this pull request as ready for review March 23, 2026 11:31
@astral-sh-bot astral-sh-bot Bot requested a review from amyreese March 23, 2026 11:31
Copy link
Copy Markdown
Contributor

@Lee-W Lee-W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few nits. but overall looks good!

Comment thread crates/ruff_linter/src/codes.rs Outdated
Comment thread crates/ruff_linter/resources/test/fixtures/airflow/AIR004.py Outdated
# Mixed content (not just xcom_pull)
task_10 = BashOperator(
task_id="task_10",
bash_command="echo {{ ti.xcom_pull(task_ids='task_1') }}",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we're worrying about false possitive?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we? What do you suggest?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant why shouldn't we replace if it's mixed content

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other cases, the template (string) is replaced by a different type (xcom). In a mixed context we need to keep a string, which relies on the correct serializability of xcoms. IMO this is more trouble than it's worth.

However, let's consider your proposal for a second - what should be the replacement in this example?

bash_command="echo {{ ti.xcom_pull(task_ids='task_1') }}",  # original
bash_command="echo {{ task_1.output }}",
bash_command="echo " + task_1.output,
bash_command=f"echo {task_1.output}",
...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be bash_command="echo {{ task_1.output }}",

Comment thread crates/ruff_linter/src/rules/airflow/mod.rs Outdated
@Dev-iL Dev-iL force-pushed the 2602/airflow/xcom_template branch from e51918c to 64581b7 Compare March 24, 2026 13:55
@Dev-iL Dev-iL changed the title [airflow] Implement airflow-xcom-pull-in-template-string (AIR004) [airflow] Implement airflow-xcom-pull-in-template-string (AIR201) Mar 24, 2026
@Dev-iL Dev-iL force-pushed the 2602/airflow/xcom_template branch from 64581b7 to dcdeb6a Compare March 25, 2026 08:11
Copy link
Copy Markdown
Contributor

@sjyangkevin sjyangkevin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Overall looks good to me but one more small feedback. @Lee-W when you have time, would you mind also have a look see if the following valid?

Should we also handle an edge case that when the task is defined in a TaskGroup, let's say taskgroup_1. The callable task_id will be group_id.task_id, for example

task_4 = BashOperator(
    task_id="task_4",
    bash_command="{{ ti.xcom_pull('taskgroup_1.task_1') }}",
)

https://www.astronomer.io/docs/learn/task-groups#task_id-in-task-groups

@MichaReiser
Copy link
Copy Markdown
Member

MichaReiser commented Mar 30, 2026

@Lee-W, how many rules do you plan on adding? I'm asking because adding and maintaining all these airflow rules is a considerable effort on our end. Don't get me wrong, the contributions are great, but it's unfortunately not at zero cost for us. If you plan on adding many rules, I think it's best if we discuss that first and how we can scale this process.

@Dev-iL
Copy link
Copy Markdown
Contributor Author

Dev-iL commented Mar 30, 2026

If you plan on adding many rules, I think it's best if we discuss that first and how we can scale this process.

By all means - let's discuss!

  • We don't know how many of these rules there will be, but they will likely come one at a time in the future.
  • If ruff supported plugins, so that rules could be proposed/maintained/reviewed entirely on the Airflow side, that would probably be the most scalable solution long term.

@potiuk
Copy link
Copy Markdown

potiuk commented Mar 30, 2026

Yeah. I know we've heard in the past that plugin support is not really priority, but possibly we could do something about it now? We would actually love if Airflow rules were somewhere outside of main ruff core and if we could just configure it or even ask our users to install plugins separately. I am not sure if plugins is something on the roadmap now @MichaReiser ?

@Lee-W
Copy link
Copy Markdown
Contributor

Lee-W commented Mar 31, 2026

@MichaReiser, we don't have a clear understanding of whether we should add more rules at this time.

However, we definitely need to discuss our next steps regarding this matter!

Currently, if anyone has ideas about best practices for Airflow, they will need to initiate a discussion and build consensus within the Airflow community. This process takes time and doesn't happen often, so I don’t expect many changes.

Copy link
Copy Markdown
Contributor

@Lee-W Lee-W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MichaReiser
Copy link
Copy Markdown
Member

Currently, if anyone has ideas about best practices for Airflow, they will need to initiate a discussion and build consensus within the Airflow community. This process takes time and doesn't happen often, so I don’t expect many changes.

Thanks, this sounds good to me.

Plugin support is on our mind. But we first want to push some highly demanded and larger user-facing features (warning severity, human-readable names, rule recategorization, unified format/check command). I suspect that plugins will be high up on our feature list once these are out.

Copy link
Copy Markdown
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This overall looks good. The Jinja parsing makes me a bit uneasy. I think there are a few more cases that need handling and there's potential for more code reuse.

Comment thread crates/ruff_linter/src/rules/airflow/rules/xcom_pull_in_template_string.rs Outdated
Comment thread crates/ruff_linter/src/rules/airflow/rules/xcom_pull_in_template_string.rs Outdated
Comment thread crates/ruff_linter/src/rules/airflow/rules/xcom_pull_in_template_string.rs Outdated
Comment thread crates/ruff_linter/src/rules/airflow/rules/xcom_pull_in_template_string.rs Outdated
}

// Check keyword arguments for xcom_pull template strings.
for keyword in &*call.arguments.keywords {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this indeed apply to all keyword arguments or can we restrict the rule to a few known keyword names? What if the arguments are passed as positional arguments (or are they required to be keyword arguments)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Airflow, any operator argument can be a template field — it's determined by the operator's template_fields class attribute and varies per operator. Since the {{ ti.xcom_pull(...) }} pattern is specific enough to avoid false positives, I now check all arguments (both positional and keyword) with a code comment explaining the rationale. Positional template strings are rare in operator calls but technically possible.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there some known fields that we can always skip? E.g. task_id, dag and task_group?

Copy link
Copy Markdown
Contributor Author

@Dev-iL Dev-iL Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate or give an example of the use case you have in mind?

This rule was made to detect a common (and outdated) pattern where specifically the return value of a task is retrieved. The reason for the proliferation of the pattern is because it was the recommended (only?) way of doing things for a long time, and since this is what appeared in the docs - that's what users ended up doing. While it is possible to use positional args to call the function i.e.

ti.xcom_pull("task")  # Plausible
ti.xcom_pull("task", None)  # Uncommon
ti.xcom_pull("task", None, "return_value")  # Uncommon

I think the above are so uncommon that false-negatives are not a concern in practice. Moreover, xcom_pull has 5 "positional-able" args (see below) - and we don't want to touch it if it the user provides anything except just task_ids and (optionally) key.

To conclude - I think that dealing with arbitrary templates is outside the scope of this rule.

CC: @Lee-W @sjyangkevin @potiuk


The current signature of xcom_pull:

def xcom_pull(
    self,
    task_ids: str | Iterable[str] | None = None,
    dag_id: str | None = None,
    key: str = XCOM_RETURN_KEY,  # "return_value"
    include_prior_dates: bool = False,
    session: Session = NEW_SESSION,
    *,
    map_indexes: int | Iterable[int] | None = None,
    default: Any = None,
    run_id: str | None = None,
) -> Any:

@MichaReiser MichaReiser assigned MichaReiser and unassigned amyreese Apr 9, 2026
@Dev-iL Dev-iL force-pushed the 2602/airflow/xcom_template branch from dcdeb6a to 2ab3e1b Compare April 10, 2026 06:52
Comment thread crates/ruff_linter/src/rules/airflow/rules/xcom_pull_in_template_string.rs Outdated
Comment thread crates/ruff_linter/src/rules/airflow/rules/xcom_pull_in_template_string.rs Outdated
Comment thread crates/ruff_linter/resources/test/fixtures/airflow/AIR201.py
}

// Check keyword arguments for xcom_pull template strings.
for keyword in &*call.arguments.keywords {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there some known fields that we can always skip? E.g. task_id, dag and task_group?

Dev-iL and others added 4 commits April 10, 2026 14:27
- Rewrite parse_xcom_pull_template using ruff_python_trivia::Cursor
  instead of strip_prefix/strip_suffix chains
- Extract reusable helpers: eat_whitespace, parse_identifier,
  parse_quoted_string
- Add whitespace tolerance between all token pairs (e.g. ti . xcom_pull)
- Bail on escaped quotes in string content
- Broaden argument checking to cover both positional and keyword args
  with explanatory comment about template_fields
- Add Fix safety doc section
- Add unit tests for new whitespace/escape/unknown-keyword handling

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…_whitespace

- Use `iter_source_order` for argument iteration (MichaReiser)
- Use `char::is_whitespace` instead of `is_ascii_whitespace` to match
  Jinja's Unicode whitespace semantics (MichaReiser)
- Support reordered keyword arguments: `key='return_value', task_ids='...'`
  is now recognized in addition to the existing order (MichaReiser + Lee-W)
- Extract `parse_task_id_value` helper for list/tuple wrapping logic
- Add unit tests for reordered keyword patterns
- Add fixture trigger case for reordered keywords (task_25)
- Show modern `.output` replacement pattern as comment on mixed-content test
- Regenerate snapshot in current insta format

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@Dev-iL Dev-iL force-pushed the 2602/airflow/xcom_template branch from 1765da5 to 485dc03 Compare April 10, 2026 12:31
@Dev-iL Dev-iL requested a review from MichaReiser April 12, 2026 09:32
@MichaReiser
Copy link
Copy Markdown
Member

MichaReiser commented Apr 13, 2026

This looks good to me, but Codex had two findings. I find it difficult to assess whether they're correct because I'm unfamiliar with airflow. Could someone take a look:

  1. False positive on a non-templated field:
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator

def f():
    pass

task_1 = PythonOperator(task_id="task_1", python_callable=f)

BashOperator(
    task_id="{{ ti.xcom_pull(task_ids='task_1') }}",  # currently flagged by AIR201
    bash_command="echo hi",
)

Why this is wrong: task_id is not a Jinja template_field, so Airflow will not render it. AIR201 should not inspect it.

  1. Wrong autofix for a TaskFlow @task:
from airflow.decorators import task
from airflow.operators.bash import BashOperator

@task
def extract_data():
    return "value"

BashOperator(
    task_id="consumer",
    bash_command="{{ ti.xcom_pull(task_ids='extract_data') }}",
)

Current suggested fix in the PR:

BashOperator(
    task_id="consumer",
    bash_command=extract_data.output,
)

Why this is wrong: for TaskFlow tasks, you need the called task object / XComArg, e.g. extract_data() or a variable bound to that result, not the decorator object’s .output.

@Dev-iL
Copy link
Copy Markdown
Contributor Author

Dev-iL commented Apr 13, 2026

  1. False positive on a non-templated field:

Codex is technically correct here, but I don't think anyone does this. If they do, it's likely a mistake, and while it may be flagged for the wrong reason, it would indicate to the user that something's up. Also, not sure a template like that is even a valid task_id (there are some rules it needs to follow which I cannot find at the moment).

  1. Wrong autofix for a TaskFlow @task:

This one's true, but the original syntax is already wrong for the exact same reason. We wouldn't be replacing working code with broken one.

@MichaReiser
Copy link
Copy Markdown
Member

This is great work. Thank you

@MichaReiser MichaReiser merged commit 0921ed2 into astral-sh:main Apr 14, 2026
44 checks passed
@Dev-iL Dev-iL deleted the 2602/airflow/xcom_template branch April 14, 2026 06:49
@Dev-iL
Copy link
Copy Markdown
Contributor Author

Dev-iL commented Apr 14, 2026

This is great work. Thank you

Thank you for bringing this over the finish line!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

preview Related to preview mode features rule Implementing or modifying a lint rule

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants