Skip to content

Comments

Extract parquet statistics for StructArray#6090

Closed
Lordworms wants to merge 3 commits intoapache:mainfrom
Lordworms:struct_statistics
Closed

Extract parquet statistics for StructArray#6090
Lordworms wants to merge 3 commits intoapache:mainfrom
Lordworms:struct_statistics

Conversation

@Lordworms
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@alamb
Copy link
Contributor

alamb commented Aug 2, 2024

closing/reopening to rerun ci

@alamb alamb closed this Aug 2, 2024
@alamb alamb reopened this Aug 2, 2024
@github-actions github-actions bot added the parquet Changes to the parquet crate label Aug 5, 2024
@alamb
Copy link
Contributor

alamb commented Sep 18, 2024

I am depressed about the large review backlog in this crate. We are looking for more help from the community reviewing PRs -- see #6418 for more

1 similar comment
@alamb
Copy link
Contributor

alamb commented Sep 18, 2024

I am depressed about the large review backlog in this crate. We are looking for more help from the community reviewing PRs -- see #6418 for more

@duongcongtoai
Copy link

duongcongtoai commented Sep 19, 2024

i'm interested in this topic, but not too aquantanted to the source code of this crate yet, give me more time and i can share some review load!

]),
};
// Due to https://github.com/apache/datafusion/issues/8334,
// statistics for struct arrays are not supported

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we remove this comment after this was resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure


expected_max: struct_array(vec![
(Some(true), Some(3)),
(Some(true), Some(0)),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

original values was

                (Some(true), Some(0)),
                (Some(false), Some(5)),
                (None, None),

Then isn't min should be Some(false), Some(5) and max should be Some(true),Some(0)

@alamb alamb marked this pull request as draft December 17, 2024 19:32
@alamb
Copy link
Contributor

alamb commented Dec 17, 2024

This PR looks like it has some comments that are still waiting to be addressed. I am sorry I don't have time to push this along -- if anyone cares about this particular feature I think the first thing would be do some research and confirm what the behavior of other parquet implementations are

It is not clear to me if there is much value / precident for extracting min/max statistics for structs (the min/max for field values seem much more important)

I am sorry this has dragged out so long @Lordworms -- I think we should close this PR unless someone else would like to push it along.

@Jefffrey
Copy link
Contributor

Closing this PR as stale. Feel free to reopen if it becomes active again.

@Jefffrey Jefffrey closed this Dec 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants