ARROW-13548: [C++] Implement temporal difference kernels#10960
ARROW-13548: [C++] Implement temporal difference kernels#10960lidavidm wants to merge 4 commits intoapache:masterfrom
Conversation
|
I think Python doesn't have bindings for the interval types - that should be fixed in a separate issue I would think. |
|
Would it be straightforward to add a week variant of this that takes |
It should be doable. I'll update the PR. |
5bcb371 to
f4ce70e
Compare
|
@rok can you take a look here? |
|
I'll *rebase this in a little while |
55dcad6 to
342e6d8
Compare
There was a problem hiding this comment.
I don't have a better suggestion, but boundaries sounds very implementation-like.
There was a problem hiding this comment.
How about this?
const FunctionDoc years_between_doc{
"Compute the number of years between two timestamps",
("Returns the number of year boundaries crossed from the first timestamp to the\n"
"second. That is, the difference is calculated as if the timestamps were\n"
"truncated to the year.\n"
"Null values emit null."),
{"start", "end"}};
There was a problem hiding this comment.
Better. How about using 'complete years passed/elapsed/..' instead of 'year boundaries crossed'?
There was a problem hiding this comment.
I suppose the reason why I'm using 'boundaries' is because the difference between 1998-12-01 and 1999-01-1 is '1 year' here even though a complete year hasn't passed.
There was a problem hiding this comment.
Oh, I thought the entire year needs to pass. Right, boundaries are better yeah.
There was a problem hiding this comment.
Perhaps you could find the first week starting weekday after the starting date and then calculate the difference // 7?
4d17d71 to
6807df5
Compare
|
Changes:
|
|
I'll take another look tomorrow. But this already looks pretty good. |
docs/source/cpp/compute.rst
Outdated
There was a problem hiding this comment.
Sometimes you use _interval_ and sometimes you don't. I realize it's due to the type returned but it still bothers my eyes :). Probably best keep it as is.
There was a problem hiding this comment.
Yeah, I wasn't sure how to disambiguate these. Note that month_interval_between and months_between are basically redundant, except MonthIntervalType is int32_t, not int64_t.
There was a problem hiding this comment.
Well, is it useful to have a int64_t return?
There was a problem hiding this comment.
Not particularly, except for consistency; that said for Python, we don't have the interval types bound yet, so there'll be a period where this kernel is unavailable if we remove this version. I think overall it's not worth keeping it though so I'll remove it.
66b79bc to
7dbd1ee
Compare
|
LGTM @lidavidm! |
There was a problem hiding this comment.
Is it a recommended format for months? Should we use "M" instead?
There was a problem hiding this comment.
I was following the existing formatting of the array printers, but M probably makes more sense/I can align these with strftime.
There was a problem hiding this comment.
Actually, I suppose that doesn't make any sense. M seems preferable to m though so I'll do that.
docs/source/cpp/compute.rst
Outdated
There was a problem hiding this comment.
Well, is it useful to have a int64_t return?
docs/source/cpp/compute.rst
Outdated
There was a problem hiding this comment.
If the "notes" column is empty, then can omit it.
docs/source/cpp/compute.rst
Outdated
There was a problem hiding this comment.
"has a defined timezone" is less weird than "is zoned".
docs/source/cpp/compute.rst
Outdated
There was a problem hiding this comment.
I would say "because the local year is the same even though the UTC years would be different".
There was a problem hiding this comment.
Is MakeArrayFromScalar necessary or is the kernel able to broadcast a scalar input?
There was a problem hiding this comment.
At some point, we may instead define a Validate() method on FunctionOptions.
There was a problem hiding this comment.
Nit, but "Extract" refers to component extraction, which this isn't doing?
There was a problem hiding this comment.
How about: Return the number of year boundaries crossed from `start` to `end`.
20742d5 to
5a5421d
Compare
Co-authored-by: Rok Mihevc <[email protected]>
5a5421d to
7124872
Compare
|
I've rebased this. |
| static int64_t GetQuarters(const year_month_day& ymd) { | ||
| return static_cast<int64_t>(static_cast<int32_t>(ymd.year())) * 4 + | ||
| (static_cast<uint32_t>(ymd.month()) - 1) / 3; | ||
| } |
There was a problem hiding this comment.
You could use this function for the quarter extractor and delete this logic there.
There was a problem hiding this comment.
I mean this:
arrow/cpp/src/arrow/compute/kernels/scalar_temporal.cc
Lines 405 to 410 in 7124872
Thanks for the review. Yeah, let me add support for the other temporal types - I suppose all that's merged by now. |
|
I've added the support for date/time types (years, months, etc. don't accept time types). |
| &default_day_of_week_options, DayOfWeekState::Init); | ||
| DCHECK_OK(registry->AddFunction(std::move(weeks_between))); | ||
|
|
||
| auto day_time_between = |
There was a problem hiding this comment.
Nit: day_time_interval_between would be more consistent. Same for month_day_nano_interval_between. I don't feel strongly about it, just noticing :).
|
Thanks, fixed those issues. |
|
I think this is ready to go. |
Implements kernels to get the number of {years, months, days and millis, days, hours, minutes, seconds, millis, micros, nanos} between two timestamps. Everything returns either Int64 or DayTimeInterval.
Closes apache#10960 from lidavidm/arrow-13548
Authored-by: David Li <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
Implements kernels to get the number of {years, months, days and millis, days, hours, minutes, seconds, millis, micros, nanos} between two timestamps. Everything returns either Int64 or DayTimeInterval.