Skip to content

[parquet] Avoid read parquet index when there is no filter pushdown.#6317

Merged
alamb merged 4 commits intoapache:mainfrom
Ted-Jiang:6317
May 11, 2023
Merged

[parquet] Avoid read parquet index when there is no filter pushdown.#6317
alamb merged 4 commits intoapache:mainfrom
Ted-Jiang:6317

Conversation

@Ted-Jiang
Copy link
Member

Which issue does this PR close?

Closes #6315

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@Ted-Jiang Ted-Jiang requested a review from alamb May 10, 2023 08:02
@github-actions github-actions bot added the core Core DataFusion crate label May 10, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Ted-Jiang -- this is looking good. I think we should remove the unwrap to avoid panics but otherwise I think this is looking good

}

/// Returns the number of filters in the [`PagePruningPredicate`]
pub fn filter_number(&self) -> usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the use of this code, I think using is_empty() rather than len() would be clearer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think need the number of filters in future 😆

.unwrap();

// Without filter will not read pageIndex.
assert!(bytes_scanned_with_filter > bytes_scanned_without_filter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice test

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Ted-Jiang -- this is looking good. I think we should remove the unwrap to avoid panics but otherwise I think this is looking good

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) labels May 11, 2023
@github-actions github-actions bot removed logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) physical-expr Changes to the physical-expr crates labels May 11, 2023
@Ted-Jiang Ted-Jiang requested a review from alamb May 11, 2023 05:51
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- thank you @Ted-Jiang

@alamb alamb merged commit a07d6eb into apache:main May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid read parquet index when there is no filter pushdown

2 participants