ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type#10176
ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type#10176rok wants to merge 17 commits intoapache:masterfrom
Conversation
|
Missing features at moment:
Different input / output types should also be handled. |
Unfortunately yes .. But since that's a lot more complicated, and I don't think we already have tz-related functionality, for now those kernels could maybe just raise an error if the timestamp type has a |
2c4968d to
2765cef
Compare
|
It seems tests on windows are not able to download IANA timezone database or access it post-download (https://howardhinnant.github.io/date/tz.html#Installation). I'll solve it but meanwhile I I'd like to ask for review. Open questions I have at the moment:
|
|
I've adopted some code from iso_week.h. Would it make more sense to add iso_week.h as a vendored library instead? Should we add a note to our license? (iso_week.h is MIT licensed). |
|
General comment: the PR is quite big, and the timezone-related logic further added complexity. So from a workflow perspective (to ensure reviewing / getting this merged is manageable), could it make sense to keep the timezone-related logic for a separate PR? (didn't check the code to see whether that's actually feasible) |
Keeping tz-aware logic would be fairly easy to move to a separate PR. I'll move it out. |
f8c6665 to
23f51e7
Compare
7bcc910 to
69c5760
Compare
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Thanks for the updates! Did a final pass through the docs.
Also, it would maybe be good to add a test for a timestamp that doesn't fit into the nanosecond range?
| {"values"}}; | ||
|
|
||
| const FunctionDoc millisecond_doc{ | ||
| "Extract millisecond values", |
There was a problem hiding this comment.
Can you add here the clarification about it being the number of milliseconds since the last full second, as you did for the C++ functions? (and same for microsecond/nanosecond below)
| {"values"}}; | ||
|
|
||
| const FunctionDoc subsecond_doc{ | ||
| "Extract subsecond values", |
There was a problem hiding this comment.
Here also best to add the clarification of what subsecond is from the Subsecond docstring
| } | ||
| return {static_cast<int64_t>(static_cast<int32_t>(y)), | ||
| static_cast<int64_t>(trunc<weeks>(t - start).count() + 1), | ||
| static_cast<int64_t>(weekday(ymd).iso_encoding())}; |
There was a problem hiding this comment.
In the DayOfWeek implementation you subtract 1 (to have 0-6), while here not. I assume that's the ISO definition? If so, we should clearly document that difference in the ISOCalendar documentation (since now it seems to imply that the "day_of_week" field is the same as the "day_of_week" kernel. Maybe also prepend it with "iso_" ?
4dc50cb to
ee53452
Compare
|
Thanks for the comments and suggestions @jorisvandenbossche! :) |
Co-authored-by: Joris Van den Bossche <[email protected]>
|
Thank you for this PR @rok ! I believe this is ready for merging. I'll just wait for CI. |
Co-authored-by: Joris Van den Bossche <[email protected]>
|
Just noticed I missed this:
Do you mind if I move it to ARROW-12980? |
|
Actually @jorisvandenbossche what do you mean by "doesn't fit into the nanosecond range"? |
|
Cool, this is a nice start for datetime kernels!
I meant a date that falls outside the date range of 1677-09-21 - 2262-04-11 (the range that ns resolution can cover). |
|
Thanks @jorisvandenbossche @pitrou for the help and feedback! I'm very glad this is merged! :) |
This is to resolve ARROW-11759.