[c++] Arrow utils with current-domain option#2911
Conversation
8ffc6f8 to
ad3b5d1
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2911 +/- ##
==========================================
+ Coverage 89.87% 90.02% +0.15%
==========================================
Files 38 38
Lines 3999 3999
==========================================
+ Hits 3594 3600 +6
+ Misses 405 399 -6
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
A more general comment orthogonal to this PR: we really should think about making more use of nanoarrow (as I have) to remove most if not all of this arrow hand-holding boilerplate code (ie the bits that have to set each element of the struct explicitly). |
|
Thanks @eddelbuettel . That's a good idea for another PR. At the moment I want to separate refactors / dependency changes from the careful sequencing of steps necesary for confident rollout of the current-domain-backed new-shape feature. Introducing nanoarrow here would be too many variables: a dependent-package change, along with the current-domain support. As follow-on -- separately -- yes, I am behind this. I'll file an issue to track this great idea. |
655210f to
fb3e634
Compare
def9082 to
5f55777
Compare
86d4f5b to
db0348b
Compare
db0348b to
3441d9e
Compare
a51f6bd to
daecd1b
Compare
| // objects for which soma_joinid is not a dim at all | ||
| // * These cases are all actively unit-tested within apis/python/tests | ||
| if (dim.type() != TILEDB_INT64) { | ||
| throw TileDBSOMAError("Found unexpected non-int64 dimension type."); |
There was a problem hiding this comment.
I know where you are coming from here, and I slept over it one night, but I am not sure I am fully on-board.
We have the factory methods. We have the SOMA definitions. It is clear what a well-formed SOMA object is.
But I still cannot shake the feeling that this seems "wrong" in the sense of working, testing, testable, useful, ... general code being removed.
I'll sleep over it another night. Not objecting to the change, but scratching my chin. 🤔
There was a problem hiding this comment.
The removed code is dead ... unit tests would be throwing if it were not. :)
There was a problem hiding this comment.
That was not really my point / I understand that. But such a converter is also useful outside of SOMA, the test is fine, and I have tested and developed too with non-soma dataframes. That can be useful.
Again, not vetoing anything here, just saying that I simultaneously understand why you propose what you and that I likely would not do it. No more, no less, and surely no veto.
(And as an aside, "we can remove because doing so does not break tests" is not really a viable rule because we all know that coverage metrics are never perfect and there may well be some code that potentially calls this (this is open source after all). Anyway all good by me either way ...)
41c6d96 to
1db05d7
Compare
Co-authored-by: nguyenv <[email protected]>
b58dba1 to
ee61b8e
Compare
c6f7f73 to
3517a1e
Compare
Issue and/or context: As tracked on issue #2407 / [sc-51048].
The intended Python and R API changes are all agreed on and finalized as described in #2407.
Changes:
create_arrow_schemaandcreate_index_column_infotake flags for status quo (use_current_domain=false) as well as new-shape (use_current_domain=true-- [python/r/c++] Revisitshapefor component arrays #2407) . These flags will allow for a reasoned, well-tested transition to new-shape functionality at the R and Python levels, implemented on upcoming PRs.libtiledbsomalevel will validate correctness of the libtiledbsoma-to-core contracts, thereby increasing confidence of the correctness of the libtiledbsoma-level implementation.resizelogic tolibtiledbsoma, and unit-test its correctness. However, this PR is already more than big enough as-is.Notes for Reviewer:
As of 2024-08-18, this is stacked on top of #2910. #2910 should be approved and merged first, then, this should be rebased on the new
main. (What should not happen is for this PR to be merged into #2910.)As of 2024-08-18 this incorporates #2915 without being stacked on top of it. Here, too, #2915 should be approved and merged first, then, this should be rebased on the new
main.