ARROW-242: Support Timestamp Data Type#107
Conversation
There was a problem hiding this comment.
Can you indicate the unit in this dtype= explicitly?
There was a problem hiding this comment.
Can you add time + milliseconds precision to these sample values (otherwise the time bit is all zero)?
|
To make life more interesting, Impala stores timestamps in Parquet as Int96 JIRA indicates that that this may be the only real use for INT96: I think it would be safe for us to insert logic that converts all INT96 values into arrow Timestamps, what do you think? @nongli @julienledem if you have any thoughts on this let us know |
There was a problem hiding this comment.
TimestampType defaults to TimestampType::Unit::MILLI -- not sure if you wanted to make this explicit here or add tests for the other units (otherwise create follow-up JIRA)
There was a problem hiding this comment.
oh, I actually did not see that. ups, will have to improve a bit on my code then.
|
I think we might as well set the I'd be perfectly fine deferring the Int96 auto-conversion to a separate patch. Sorry this took me almost a week to get to. |
|
@wesm |
Change-Id: I8e5551cddec30888bbe37f1fc010774eba43f5dd
Change-Id: I0c33c43e4fe099b3c9261eac1544cb3343ea5834
|
@wesm Any chance to get a review soon? |
|
Sorry for the delay — while there wasn't complete consensus on the ML I think we should definitely get something merged that works for us for now. Taking a look |
|
+1 -- some items to address in follow up JIRAs
|
Author: Uwe L. Korn <[email protected]> Closes apache#107 from xhochy/parquet-619 and squashes the following commits: 612a2f2 [Uwe L. Korn] Check that the correct data was written 66fb32a [Uwe L. Korn] Add unit test for LocalFileOutputStream a1179d1 [Uwe L. Korn] Add missing header 6ecb940 [Uwe L. Korn] Add OutputStream for local files Change-Id: Ie9403af419f8712982752b4f5256136bf9d7b26f
* Let the index start from 0 for split_part * Correct the expected error message in unit test
For the Pandas<->Parquet bridge this is a lossy conversion but must be explicitly activated by the user.
Regarding Parquet 1.0: Yes, the logical type is not supported but should be simply ignored by the reader. Implementation for INT96 timestamps is not in the scope of this PR.