ARROW-835: [Format][C++][Java] Create a new Duration type#3644
ARROW-835: [Format][C++][Java] Create a new Duration type#3644emkornfield wants to merge 3 commits intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3644 +/- ##
==========================================
- Coverage 88.07% 88.06% -0.01%
==========================================
Files 773 774 +1
Lines 96887 97162 +275
Branches 1251 1251
==========================================
+ Hits 85329 85566 +237
- Misses 11322 11360 +38
Partials 236 236
Continue to review full report at Codecov.
|
java/vector/src/main/java/org/apache/arrow/vector/IntervalEpochMilliVector.java
Outdated
Show resolved
Hide resolved
|
@wesm could you look at the format changes to see if you are ok with them? |
|
Failure was NodeJs |
|
since this is a format change (backward compatible though), I'll wait for @wesm to approve too before merging. |
|
@pravindra thanks for the review |
format/Schema.fbs
Outdated
There was a problem hiding this comment.
Looking over the old e-mail threads it might be that TimeDelta or TimestampDelta is a better name for this type?
There was a problem hiding this comment.
Are you happy with "DurationInterval"? It could also be called simply "Duration" or "TimeDuration"
|
@wesm based on comments on the other JIRAs CLs it seems like it would be desirable to add an implementation for C++, I can add that to this CL but would you mind taking a look to see if you are ok with naming/modeling (I would prefer not to have to refactor just due to naming issues if possible). |
|
I believe the Python on OS/X error is spurious (it was green on previous build). |
|
Will work on reviewing the C++ side of this |
wesm
left a comment
There was a problem hiding this comment.
The C++ side of this, and the changes to the Flatbuffers files, all looks good to me. Thanks @emkornfield for slogging through this and implementing C++, Java, and integration tests in one go!
One question I have is whether we want to commit to the name "DurationInterval" rather than alternatives such as "Duration" or "TimeDelta" or "TimeDuration". I don't think the names are sacred
Can someone (@siddharthteotia ?) carve out a little time to review the Java so we can try to get this merged soon?
There was a problem hiding this comment.
Do you think it's possible to collapse this into a single class? There was an effort to do this for Timestamp (with a TimeUnit class) but it failed only because Dremio had taken on dependency with the TimestampUNIT classes
There was a problem hiding this comment.
Potentially, I followed the pattern already established in the codebase, but I'm open to trying if it is desirable. @pravindra @siddharthteotia do you think this is desirable/possible?
There was a problem hiding this comment.
Sounds okay to me. Let's see what @siddharthteotia and @pravindra say.
There was a problem hiding this comment.
IMO, having a single class is definitely desirable. Single this one is starting on a fresh slate, it's worth a try.
cpp/src/arrow/array.h
Outdated
There was a problem hiding this comment.
FWIW, I'm a bit disappointed that we aren't deprecating this type =/
There was a problem hiding this comment.
I didn't want to get dragged into what it means deprecate this, should be easy enough to do if we want it. As noted below I tried to say it was optional which I seem to recall seeing this was ok on a previous ML thread.
cpp/src/arrow/array/builder_time.h
Outdated
There was a problem hiding this comment.
Yes, it appears so, I'll push a commit removing them. It looks like there might be more redundancy in builder_primitive.h
cpp/src/arrow/builder.cc
Outdated
There was a problem hiding this comment.
I think static_cast (or checked_cast) is sufficient here?
cpp/src/arrow/type.h
Outdated
There was a problem hiding this comment.
nit: as far as naming I could see an argument for using
interval_durationinterval_day_timeinterval_month
but I'm OK with the way things are here
There was a problem hiding this comment.
I could see this to, but I'm happier without the reverse notation.
There was a problem hiding this comment.
note "duration" conflicts with some of our time classes so I renamed to "duration_type", hopefully that is ok
There was a problem hiding this comment.
Need to look back I think it was in the vendored time library we were using. I can look deeper if you don't think there should be a conflict
There was a problem hiding this comment.
looking again they shouldn't conflict, I will try to fix this today.
There was a problem hiding this comment.
renamed. had to touch some of the vendored code.
format/Schema.fbs
Outdated
There was a problem hiding this comment.
Got it. If it isn't being deprecated then it's reasonable to implement it in C++ and have integration tests
format/Schema.fbs
Outdated
There was a problem hiding this comment.
Are you happy with "DurationInterval"? It could also be called simply "Duration" or "TimeDuration"
|
@wesm Thanks for the review I'll try to address comments in the next few days. In terms of naming, I don't know the etiquette for changing the name once it gets approved on the ML. If was going to change it, I think Duration would be my choice. @pravindra already reviewed most of the java code, the main changes that happened where plumbing through support for the integration test. |
cpp/src/arrow/array/builder_dict.cc
Outdated
There was a problem hiding this comment.
Thanks, they were removed.
|
@wesm I think I addressed all your feedback. The OS/X build seems to in an infinite CMake loop with something like I'm not sure why my change would have caused this. Any thoughts? @pravindra I was able to consolidate all the java vectors into one, could you take a look? |
2913b09 to
773a2b0
Compare
|
Baffling |
|
It must have something to do with this being an older PR. Is Appveyor enabled on your fork? If not I can push this to my fork to get an Appveyor build going |
|
No appveyor isn't enabled on my fork. let me try squashing everything, see if that helps. |
and mark DayTimeInterval as deprecated. Provides implementation for Duration, DayTimeInterval and YearMonth interval in C++ and Java (for duration). Adds integration test.
|
Ahh, squash seems to have worked, lets hope things stay green. |
|
@emkornfield it looks you made some changes to vendored code which I think we should try to avoid doing in general, can we open a JIRA to restore to the vendored version and address the name conflict another way? I don't want this issue to block the patch since I need this to go in and then rebase #4316 on top |
|
Appveyor appears to be about ~6 hours backed up so I'm going to push a branch on my fork to try to get a quicker thumbs up to merge this |
|
Here's an Appveyor build starting now on my fork which will take a while to run https://ci.appveyor.com/project/wesm/arrow/builds/24577591 |
|
Thanks. https://issues.apache.org/jira/browse/ARROW-5346 opened to revert the changes. Right now the only solution seems like reordering includes in some cases or fixing upstream. I will have to think about this. |
|
Appveyor build looks OK, I'm going to wait a little while yet to make sure the MinGW build works before merging |
|
@wesm appveyor on your branch looks all green. |
|
Awesome. Thanks @emkornfield!! |
- Create a new DurationInterval type in format. - Implement all interval types in C++ and Java including integration test. Once this is checked in, I think https://issues.apache.org/jira/browse/ARROW-352 can be resolved as a won't fix? Author: Micah Kornfield <[email protected]> Closes apache#3644 from emkornfield/new_type and squashes the following commits: cc6d39c <Micah Kornfield> remove system clock daf8b4f <Micah Kornfield> duration_type->duration b064816 <Micah Kornfield> Introduce a new Duration type can that represent time deltas, and mark DayTimeInterval as deprecated.
Once this is checked in, I think https://issues.apache.org/jira/browse/ARROW-352 can be resolved as a won't fix?