Skip to content

Conversation

@mbutrovich
Copy link
Contributor

@mbutrovich mbutrovich commented Mar 31, 2025

Which issue does this PR close?

Rationale for this change

We would like to enforce testing on a challenging int96 file generated by Spark with its microsecond timestamps. It is challenging because it includes dates that cannot be represented in a nanosecond timestamp.

What changes are included in this PR?

Add a test that is dependent on apache/parquet-testing#73 merging first.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Mar 31, 2025
@mbutrovich mbutrovich marked this pull request as ready for review April 3, 2025 15:02
@mbutrovich
Copy link
Contributor Author

mbutrovich commented Apr 3, 2025

apache/parquet-testing#73 merged so I think this is ready for review.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbutrovich -- this is a nice contribution. Testing for the WIN!

})
}

#[test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be worth a test showing what happens when a schema is not supplied for this file (that the data is read out as nanosecond precision)

@mbutrovich mbutrovich requested a review from alamb April 3, 2025 21:26
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @mbutrovich

let expected = Arc::new(Int64Array::from(vec![
Some(1704141296123456000), // Reads as nanosecond fine (note 3 extra 0s)
Some(1704070800000000000), // Reads as nanosecond fine (note 3 extra 0s)
Some(-4852191831933722624), // Cannot be represented with nanos timestamp (year 9999)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@alamb alamb merged commit 71d27b6 into apache:main Apr 4, 2025
16 checks passed
@mbutrovich mbutrovich deleted the parquet_test branch April 4, 2025 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support different TimeUnits and timezones when reading Timestamps from INT96

2 participants