Skip to content

[c++] Centralize some nanoarrow helpers#2994

Merged
johnkerl merged 3 commits intomainfrom
kerl/nanoarrow-helpers
Sep 16, 2024
Merged

[c++] Centralize some nanoarrow helpers#2994
johnkerl merged 3 commits intomainfrom
kerl/nanoarrow-helpers

Conversation

@johnkerl
Copy link
Copy Markdown
Contributor

@johnkerl johnkerl commented Sep 13, 2024

Issue and/or context: As tracked on issue #2407 / [sc-51048].

Changes:

Extracting readback of core domain, current domain, and non-empty domain are all currently:

  • Done slotwise within libtiledbsoma, templatized;
  • For Python: there is a loop over dim names in the pybind11 layer lke this
    .def(
    "non_empty_domain",
    [](SOMAArray& array, std::string name, py::dtype dtype) {
    switch (np_to_tdb_dtype(dtype)) {
    case TILEDB_UINT64:
    return py::cast(
    array.non_empty_domain_slot<uint64_t>(name));
    case TILEDB_DATETIME_YEAR:
    case TILEDB_DATETIME_MONTH:
    case TILEDB_DATETIME_WEEK:
    case TILEDB_DATETIME_DAY:
    case TILEDB_DATETIME_HR:
    case TILEDB_DATETIME_MIN:
    case TILEDB_DATETIME_SEC:
    case TILEDB_DATETIME_MS:
    case TILEDB_DATETIME_US:
    case TILEDB_DATETIME_NS:
    case TILEDB_DATETIME_PS:
    case TILEDB_DATETIME_FS:
    case TILEDB_DATETIME_AS:
    case TILEDB_INT64:
    return py::cast(
    array.non_empty_domain_slot<int64_t>(name));
    case TILEDB_UINT32:
    return py::cast(
    array.non_empty_domain_slot<uint32_t>(name));
    case TILEDB_INT32:
    return py::cast(
    array.non_empty_domain_slot<int32_t>(name));
    case TILEDB_UINT16:
    return py::cast(
    array.non_empty_domain_slot<uint16_t>(name));
    case TILEDB_INT16:
    return py::cast(
    array.non_empty_domain_slot<int16_t>(name));
    case TILEDB_UINT8:
    return py::cast(
    array.non_empty_domain_slot<uint8_t>(name));
    case TILEDB_INT8:
    return py::cast(
    array.non_empty_domain_slot<int8_t>(name));
    case TILEDB_FLOAT64:
    return py::cast(
    array.non_empty_domain_slot<double>(name));
    case TILEDB_FLOAT32:
    return py::cast(
    array.non_empty_domain_slot<float>(name));
    case TILEDB_STRING_UTF8:
    case TILEDB_STRING_ASCII:
    return py::cast(array.non_empty_domain_slot_var(name));
    default:
    throw TileDBSOMAError(
    "Unsupported dtype for nonempty domain.");
    }
    })
    .def(
    "soma_domain_slot",
    [](SOMAArray& array, std::string name, py::dtype dtype) {
    switch (np_to_tdb_dtype(dtype)) {
    case TILEDB_UINT64:
    return py::cast(array.soma_domain_slot<uint64_t>(name));
    case TILEDB_DATETIME_YEAR:
    case TILEDB_DATETIME_MONTH:
    case TILEDB_DATETIME_WEEK:
    case TILEDB_DATETIME_DAY:
    case TILEDB_DATETIME_HR:
    case TILEDB_DATETIME_MIN:
    case TILEDB_DATETIME_SEC:
    case TILEDB_DATETIME_MS:
    case TILEDB_DATETIME_US:
    case TILEDB_DATETIME_NS:
    case TILEDB_DATETIME_PS:
    case TILEDB_DATETIME_FS:
    case TILEDB_DATETIME_AS:
    case TILEDB_INT64:
    return py::cast(array.soma_domain_slot<int64_t>(name));
    case TILEDB_UINT32:
    return py::cast(array.soma_domain_slot<uint32_t>(name));
    case TILEDB_INT32:
    return py::cast(array.soma_domain_slot<int32_t>(name));
    case TILEDB_UINT16:
    return py::cast(array.soma_domain_slot<uint16_t>(name));
    case TILEDB_INT16:
    return py::cast(array.soma_domain_slot<int16_t>(name));
    case TILEDB_UINT8:
    return py::cast(array.soma_domain_slot<uint8_t>(name));
    case TILEDB_INT8:
    return py::cast(array.soma_domain_slot<int8_t>(name));
    case TILEDB_FLOAT64:
    return py::cast(array.soma_domain_slot<double>(name));
    case TILEDB_FLOAT32:
    return py::cast(array.soma_domain_slot<float>(name));
    case TILEDB_STRING_UTF8:
    case TILEDB_STRING_ASCII: {
    std::pair<std::string, std::string> str_domain;
    return py::cast(std::make_pair("", ""));
    }
    default:
    throw TileDBSOMAError(
    "Unsupported dtype for Dimension's domain");
    }
    })
    .def(
    "soma_maxdomain_slot",
    [](SOMAArray& array, std::string name, py::dtype dtype) {
    switch (np_to_tdb_dtype(dtype)) {
    case TILEDB_UINT64:
    return py::cast(
    array.soma_maxdomain_slot<uint64_t>(name));
    case TILEDB_DATETIME_YEAR:
    case TILEDB_DATETIME_MONTH:
    case TILEDB_DATETIME_WEEK:
    case TILEDB_DATETIME_DAY:
    case TILEDB_DATETIME_HR:
    case TILEDB_DATETIME_MIN:
    case TILEDB_DATETIME_SEC:
    case TILEDB_DATETIME_MS:
    case TILEDB_DATETIME_US:
    case TILEDB_DATETIME_NS:
    case TILEDB_DATETIME_PS:
    case TILEDB_DATETIME_FS:
    case TILEDB_DATETIME_AS:
    case TILEDB_INT64:
    return py::cast(
    array.soma_maxdomain_slot<int64_t>(name));
    case TILEDB_UINT32:
    return py::cast(
    array.soma_maxdomain_slot<uint32_t>(name));
    case TILEDB_INT32:
    return py::cast(
    array.soma_maxdomain_slot<int32_t>(name));
    case TILEDB_UINT16:
    return py::cast(
    array.soma_maxdomain_slot<uint16_t>(name));
    case TILEDB_INT16:
    return py::cast(
    array.soma_maxdomain_slot<int16_t>(name));
    case TILEDB_UINT8:
    return py::cast(
    array.soma_maxdomain_slot<uint8_t>(name));
    case TILEDB_INT8:
    return py::cast(
    array.soma_maxdomain_slot<int8_t>(name));
    case TILEDB_FLOAT64:
    return py::cast(
    array.soma_maxdomain_slot<double>(name));
    case TILEDB_FLOAT32:
    return py::cast(array.soma_maxdomain_slot<float>(name));
    case TILEDB_STRING_UTF8:
    case TILEDB_STRING_ASCII: {
    std::pair<std::string, std::string> str_domain;
    return py::cast(std::make_pair("", ""));
    }
    default:
    throw TileDBSOMAError(
    "Unsupported dtype for Dimension's domain");
    }
    })
    ;
  • For R: currently these are using TileDB-R -- which is a dependency we're currently removing -- and I don't want to imitate the above in R

The best option for Python/R interop is:

  • Push down loop-over-dim-names to libtiledbsoma
  • Make more use of nanoarrow to interface with Python and R

The current PR simply moves some code currently used in libtiledbsoma's unit-test logic to ArrowAdapter, where it can -- and will -- be used for readback of core domain, current domain, and non-empty domain for #2407.

Notes for Reviewer:

This is just a code-move. If all existing unit tests pass, that is success. No new unit tests are introduced here, since no new unit tests are needed.

@johnkerl johnkerl requested a review from nguyenv September 13, 2024 21:38
@johnkerl johnkerl force-pushed the kerl/nanoarrow-helpers branch 3 times, most recently from a4f230e to eb019b5 Compare September 13, 2024 21:41
@johnkerl johnkerl force-pushed the kerl/nanoarrow-helpers branch from eb019b5 to 2cf3790 Compare September 13, 2024 21:43
@johnkerl johnkerl changed the title [c++] Centralize some nanoarrow helpers [c++] Centralize some nanoarrow helpers Sep 13, 2024
@codecov
Copy link
Copy Markdown

codecov Bot commented Sep 13, 2024

Codecov Report

Attention: Patch coverage is 96.49123% with 4 lines in your changes missing coverage. Please review.

Project coverage is 61.15%. Comparing base (0efe51a) to head (800479d).
Report is 7 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #2994       +/-   ##
===========================================
- Coverage   89.64%   61.15%   -28.50%     
===========================================
  Files          39       94       +55     
  Lines        4096    12378     +8282     
  Branches        0      781      +781     
===========================================
+ Hits         3672     7570     +3898     
- Misses        424     4617     +4193     
- Partials        0      191      +191     
Flag Coverage Δ
libtiledbsoma 46.99% <96.49%> (?)
python 89.79% <ø> (+0.14%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 89.79% <ø> (+0.14%) ⬆️
libtiledbsoma 49.39% <92.10%> (∅)

@johnkerl johnkerl removed the request for review from nguyenv September 14, 2024 17:14
@johnkerl johnkerl marked this pull request as draft September 14, 2024 17:14
@johnkerl johnkerl changed the title [c++] Centralize some nanoarrow helpers [c++] Centralize some nanoarrow helpers [WIP] Sep 14, 2024
@johnkerl johnkerl changed the title [c++] Centralize some nanoarrow helpers [WIP] [c++] Centralize some nanoarrow helpers Sep 15, 2024
@johnkerl johnkerl requested a review from nguyenv September 15, 2024 00:03
@johnkerl johnkerl marked this pull request as ready for review September 15, 2024 00:03
@johnkerl johnkerl force-pushed the kerl/nanoarrow-helpers branch from c9c4a87 to 800479d Compare September 15, 2024 00:04
@johnkerl johnkerl merged commit 1116d43 into main Sep 16, 2024
@johnkerl johnkerl deleted the kerl/nanoarrow-helpers branch September 16, 2024 19:35
@johnkerl
Copy link
Copy Markdown
Contributor Author

Thanks @nguyenv ! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants