ARROW-13022: [R] bindings for lubridate's year, isoyear, quarter, month, day, wday, yday, isoweek, hour, minute, and second functions#10507
Conversation
r/R/dplyr-functions.R
Outdated
There was a problem hiding this comment.
I was going to mention that we actually also have an "ISO weekday" in the C++ kernels, but only as field in the iso_calendar kernel, and not as separate one. We could also add iso_day_of_week if that helps, or even just make day_of_week follow the ISO conventions. Because the 0 (monday) - 6 (sunday) might be something Python specific.
But then looked at what C++ does, and there the default is actually 0 (sunday) - 6 (saturday), so yet something else ;) (although I see that also Postgres uses that for dow)
In the end, once you have it in one form, it's an easy conversion to any of the other of course, so it might not matter that much.
There was a problem hiding this comment.
We could add options to the C++ kernel to enable different behaviors there.
There was a problem hiding this comment.
That probably makes more sense than my workaround here actually.
There was a problem hiding this comment.
If we go that way it would probably be best to have another Jira for "TemporalOptions". It's probably best you proceed with the workaround and we loop back to this later.
There was a problem hiding this comment.
OK, will do, and will open a JIRA, thanks!
There was a problem hiding this comment.
There was a problem hiding this comment.
Please leave a comment with that JIRA issue somewhere in this code that we expect to remove.
r/R/expression.R
Outdated
There was a problem hiding this comment.
"hour" is missing here?
There was a problem hiding this comment.
Oops, will add in now.
r/R/expression.R
Outdated
There was a problem hiding this comment.
Note that second might do something different. I think "second" in lubridate is the equivalent of "second + subsecond" in arrow
There was a problem hiding this comment.
Good catch. Hmm, I think this might be more complicated than that actually, as lubridate rounds it to 1 decimal place, using R's odd-even rounding whereas Arrow doesn't do the rounding, so I'm not sure if this can be done without rounding being implemented (see https://issues.apache.org/jira/browse/ARROW-12744)
There was a problem hiding this comment.
What do you mean exactly with rounding? A quick try gives me:
> second(ymd_hms("2011-06-04 12:00:01.123456"))
[1] 1.123456
which seems to give all decimals
There was a problem hiding this comment.
So it does; I think I must have been getting mixed up with something I'd changed when I needed to update a test I wrote, never mind,
There was a problem hiding this comment.
What about nanoseconds?
> second(ymd_hms("2011-06-04 12:00:01.123456789"))
[1] 1.123457Arrow would probably return 1.123456789.
There was a problem hiding this comment.
I don't think lubridate / R supports nanosecond resolution
There was a problem hiding this comment.
I think so too. So probably it should be "second = second + round(subsecond, 6)" to match that behavior?
There was a problem hiding this comment.
IDK that we should truncate the data like this. A slight (technical) API difference is probably better than throwing away precision.
There was a problem hiding this comment.
Agreed, I would also say that it's not because lubridate does not support nanoseconds that if you actually have nanoseconds (which is possible since arrow supports it) those should be discarded.
0e0a832 to
660ccfd
Compare
d67a224 to
942e92f
Compare
r/R/dplyr-functions.R
Outdated
| Expression$create( | ||
| "cast", | ||
| Expression$create("divide_checked", e1, e2), | ||
| options = cast_options(to_type = int32(), allow_float_truncate = TRUE, |
There was a problem hiding this comment.
If you make all of the scalars integers (e.g. 1L), do you still need to cast?
r/R/dplyr-functions.R
Outdated
|
|
||
| e2 = Expression$scalar(7) | ||
|
|
||
| # (e1 - e2 * ( e1 %/% e2 )) + 1 |
There was a problem hiding this comment.
Did you try doing exactly this expression? I would think it should just work because Ops.Expression is defined for all of those.
There was a problem hiding this comment.
Awesome, I had no idea, deleted a lot of unnecessary lines of code now, thanks @nealrichardson !
r/R/expression.R
Outdated
| "str_to_lower" = "utf8_lower", | ||
| "str_to_upper" = "utf8_upper" | ||
| "str_to_upper" = "utf8_upper", | ||
| # str_trim is defined in dplyr.R |
There was a problem hiding this comment.
Can you update this comment to say dplyr-functions.R?
nealrichardson
left a comment
There was a problem hiding this comment.
One whitespace suggestion but otherwise LGTM, great work!
Co-authored-by: Neal Richardson <[email protected]>
…r the "day_of_week" temporal kernel This is to resolve [ARROW-13054](https://issues.apache.org/jira/browse/ARROW-13054). This will be needed for casting timezone-naive timestamps [ARROW-13033](https://issues.apache.org/jira/browse/ARROW-13033) and defining [starting day of the week](#10507 (review)). Closes #10598 from rok/ARROW-13054 Lead-authored-by: Rok <[email protected]> Co-authored-by: Rok Mihevc <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
No description provided.