Skip to content

fix: filter pushdown when merge filter#20110

Merged
alamb merged 8 commits intoapache:mainfrom
haohuaijin:fix-filterpushdown-issue
Feb 3, 2026
Merged

fix: filter pushdown when merge filter#20110
alamb merged 8 commits intoapache:mainfrom
haohuaijin:fix-filterpushdown-issue

Conversation

@haohuaijin
Copy link
Contributor

@haohuaijin haohuaijin commented Feb 2, 2026

Which issue does this PR close?

Rationale for this change

see issue #20109

What changes are included in this PR?

  1. Remap parent filter expressions: When a FilterExec has a projection, remap unsupported parent filter expressions from output schema coordinates to input schema coordinates using reassign_expr_columns() before combining them with the current filter's predicates.

  2. Preserve projection: When creating the merged FilterExec, preserve the original projection instead of discarding it .

Are these changes tested?

yes, add some test case

Are there any user-facing changes?

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Feb 2, 2026
@github-actions github-actions bot added the core Core DataFusion crate label Feb 2, 2026
@adriangb adriangb self-requested a review February 2, 2026 14:02
@adriangb
Copy link
Contributor

adriangb commented Feb 2, 2026

@haohuaijin please ping me when this is ready for review

@haohuaijin haohuaijin marked this pull request as ready for review February 3, 2026 02:50
@haohuaijin
Copy link
Contributor Author

cc @adriangb, ready for reviews

Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! It would be great to add an SLT test but not a blocker for merging this if it's hard to hit this case from SQL.

Comment on lines -626 to +633
let unsupported_parent_filters =
child_pushdown_result.parent_filters.iter().filter_map(|f| {
matches!(f.all(), PushedDown::No).then_some(Arc::clone(&f.filter))
});
let mut unsupported_parent_filters: Vec<Arc<dyn PhysicalExpr>> =
child_pushdown_result
.parent_filters
.iter()
.filter_map(|f| {
matches!(f.all(), PushedDown::No).then_some(Arc::clone(&f.filter))
})
.collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this is the same code, just added the mut which caused a reformat.

// If this FilterExec has a projection, the unsupported parent filters
// are in the output schema (after projection) coordinates. We need to
// remap them to the input schema coordinates before combining with self filters.
if self.projection.is_some() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 note that this is because filters get evaluated before the projection


/// related to https://github.com/apache/datafusion/issues/20109
#[tokio::test]
async fn test_filter_with_projection_pushdown() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance you could add an SLT test that reproduces this? It could go in datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt

@haohuaijin
Copy link
Contributor Author

Thanks for you reviews @adriangb, i try add a SLT test, but use sql is hard to do.

@adriangb
Copy link
Contributor

adriangb commented Feb 3, 2026

No worries. Let's merge this once CI passes.

@alamb alamb added this pull request to the merge queue Feb 3, 2026
@alamb
Copy link
Contributor

alamb commented Feb 3, 2026

🚀

Merged via the queue into apache:main with commit b80bf2c Feb 3, 2026
33 checks passed
@haohuaijin haohuaijin deleted the fix-filterpushdown-issue branch February 4, 2026 01:13
@hengfeiyang
Copy link
Contributor

Hi @alamb, Is it possible to release version 52.2.0? This issue has affected our online business, and we have temporarily disabled this feature. It would be great if a minor version could be released quickly.

@adriangb
Copy link
Contributor

adriangb commented Feb 4, 2026

@hengfeiyang we can certainly put this on the docket for the next minor release, but we don’t tend to do more than 1 minor release per major version, so this may have to wait until the next major version if there aren’t many other fixes. Would you be able to point your system at a git commit or fork with this commit cherry picked? That’s what we do

@hengfeiyang
Copy link
Contributor

Yes. definitely we can do that, actually we did that in the past if we can't get a release soon. just we don't like to use fork version if we can have official release.

Thank you for you reply. we will use fork version before next release.

@alamb
Copy link
Contributor

alamb commented Feb 4, 2026

Hi @alamb, Is it possible to release version 52.2.0? This issue has affected our online business, and we have temporarily disabled this feature. It would be great if a minor version could be released quickly.

I am happy to help support another release (run the voting, etc) if someone else is willing do make the tickets, backport and changelog PRs

@alamb
Copy link
Contributor

alamb commented Feb 4, 2026

(BTW I think I may also have a bug in 52 that I'll file tomorrow which might be good to fix too). I'll post here if I do

haohuaijin added a commit to openobserve/datafusion that referenced this pull request Feb 10, 2026
## Which issue does this PR close?

- Closes apache#20109

## Rationale for this change

see issue apache#20109

## What changes are included in this PR?

1. Remap parent filter expressions: When a FilterExec has a projection,
remap unsupported parent filter expressions from output schema
coordinates to input schema coordinates using `reassign_expr_columns()`
before combining them with the current filter's predicates.

2. Preserve projection: When creating the merged FilterExec, preserve
the original projection instead of discarding it .

## Are these changes tested?

yes, add some test case

## Are there any user-facing changes?

---------

Co-authored-by: Adrian Garcia Badaracco <[email protected]>
@alamb
Copy link
Contributor

alamb commented Feb 11, 2026

Hi @alamb, Is it possible to release version 52.2.0? This issue has affected our online business, and we have temporarily disabled this feature. It would be great if a minor version could be released quickly.

I am happy to help support another release (run the voting, etc) if someone else is willing do make the tickets, backport and changelog PRs

Update: Filed a ticket to track a next patch release

haohuaijin added a commit to haohuaijin/arrow-datafusion that referenced this pull request Feb 11, 2026
## Which issue does this PR close?

- Closes apache#20109

## Rationale for this change

see issue apache#20109

## What changes are included in this PR?

1. Remap parent filter expressions: When a FilterExec has a projection,
remap unsupported parent filter expressions from output schema
coordinates to input schema coordinates using `reassign_expr_columns()`
before combining them with the current filter's predicates.

2. Preserve projection: When creating the merged FilterExec, preserve
the original projection instead of discarding it .

## Are these changes tested?

yes, add some test case

## Are there any user-facing changes?

---------

Co-authored-by: Adrian Garcia Badaracco <[email protected]>
alamb pushed a commit that referenced this pull request Feb 12, 2026
## Which issue does this PR close?

- related to #20287

## Rationale for this change

see issue #20109

## What changes are included in this PR?

1. Remap parent filter expressions: When a FilterExec has a projection,
remap unsupported parent filter expressions from output schema
coordinates to input schema coordinates using `reassign_expr_columns()`
before combining them with the current filter's predicates.

2. Preserve projection: When creating the merged FilterExec, preserve
the original projection instead of discarding it .

## Are these changes tested?

yes, add some test case

## Are there any user-facing changes?

---------

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Adrian Garcia Badaracco <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FilterPushdown do not generate correct column index when merge FilterExec

4 participants