Skip to content

[python] Dataframe read path#1793

Merged
nguyenv merged 24 commits intomainfrom
viviannguyen/python-read-path
Jan 19, 2024
Merged

[python] Dataframe read path#1793
nguyenv merged 24 commits intomainfrom
viviannguyen/python-read-path

Conversation

@nguyenv
Copy link
Copy Markdown
Contributor

@nguyenv nguyenv commented Oct 16, 2023

Issue and/or context:

Running Changes:

  • When opening a DataFrame in read-mode, use DataFrameWrapper which wraps around clib.SOMADataFrame. Otherwise, DataFrame should use the already existing write-path with ArrayWrapper which wraps around a TileDB-Py Array
  • Corrections to C++ unit tests where SOMADataFrame needs to use sparse array
  • SOMADataFrame::count needs to use nnz
  • C++ API create needs to set soma_object_type and exists needs to check soma_object_type
  • Get full nonempty domains for SOMADataFrame
  • Add getters and methods to DataFrameWrapper and ArrayWrapper (i.e nonempty domain, attr names, dim names, etc)
  • Support query conditions for SOMADataFrame
  • _query_condition.py no longer has dependency on TileDB-Py and uses Pyarrow schema instead of TileDB ArraySchema and SOMAError instead of TileDBError
  • Reorganize Pybind11 code to place util functions and PyQueryCondition (note: should refactor QueryCondition to completely remove PyQueryCondition usage) into common.h
  • SOMAArray now resides in its own soma_array.cc file
  • pytiledbsoma.cc is now the top-level file that contains stats bindings and loads modules from query_condition.cc, soma_array.cc, soma_dataframe.cc
  • Support enumerations / dictionaries for SOMADataFrame
  • Support all enumeration values in ArrowAdapter::to_arrow
  • read Pybind11 function now uses correct Arrow schema to convert dictionary arrays when importing from C to Python
  • SOMAObject::schema returns ArrowSchema which calls from SOMAArray::arrow_schema
  • Create C++ factory unique_ptr<SOMAObject> SOMAObject::open and bind in Pybind11 as clib.SOMAObject.open which returns the correct Python SOMA class
  • Fix for [python] Errors raised as RuntimeError should raise SOMAError #783

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Oct 16, 2023

Codecov Report

Merging #1793 (53cca7e) into main (08931d3) will decrease coverage by 1.55%.
The diff coverage is 66.01%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1793      +/-   ##
==========================================
- Coverage   77.59%   76.05%   -1.55%     
==========================================
  Files          87      136      +49     
  Lines        7784    10552    +2768     
  Branches        0      206     +206     
==========================================
+ Hits         6040     8025    +1985     
- Misses       1744     2433     +689     
- Partials        0       94      +94     
Flag Coverage Δ
libtiledbsoma 67.69% <26.31%> (?)
python 91.40% <84.36%> (+2.31%) ⬆️
r 67.72% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 91.40% <84.36%> (+2.31%) ⬆️
libtiledbsoma 50.61% <14.28%> (∅)

@nguyenv nguyenv force-pushed the viviannguyen/python-read-path branch from 7ee0b85 to 8992661 Compare October 17, 2023 15:44
@johnkerl johnkerl changed the title [python] Read Path [python] Read path Oct 18, 2023
@nguyenv nguyenv force-pushed the viviannguyen/python-read-path branch from c0d02b0 to 4e63bc4 Compare October 19, 2023 00:36
@nguyenv nguyenv force-pushed the viviannguyen/python-read-path branch from 7db68e1 to 2c8fa29 Compare October 20, 2023 21:32
@nguyenv nguyenv changed the title [python] Read path [python] Dataframe read path Oct 20, 2023
@nguyenv nguyenv marked this pull request as ready for review October 20, 2023 23:16
Comment thread apis/python/src/tiledbsoma/_collection.py Outdated
Comment thread apis/python/src/tiledbsoma/_dataframe.py Outdated
Comment thread apis/python/src/tiledbsoma/_dataframe.py Outdated
Comment thread apis/python/src/tiledbsoma/_dataframe.py Outdated
Comment thread apis/python/src/tiledbsoma/_tdb_handles.py Outdated
Comment thread apis/python/src/tiledbsoma/_tdb_handles.py Outdated
Comment thread apis/python/src/tiledbsoma/_tdb_handles.py Outdated
@nguyenv nguyenv force-pushed the viviannguyen/python-read-path branch 5 times, most recently from 4477714 to 4306121 Compare November 7, 2023 16:26
Copy link
Copy Markdown
Contributor

@thetorpedodog thetorpedodog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(the usual disclaimer that I’m mostly looking at the Python side for style etc. and didn’t look at the C++ code in detail)

Comment thread apis/python/src/tiledbsoma/_dataframe.py Outdated
Comment thread apis/python/src/tiledbsoma/_factory.py Outdated
Comment thread apis/python/src/tiledbsoma/_sparse_nd_array.py Outdated
Comment thread apis/python/src/tiledbsoma/_tdb_handles.py Outdated
Comment thread apis/python/src/tiledbsoma/_tdb_handles.py Outdated
Comment thread apis/python/src/tiledbsoma/_tdb_handles.py
Comment thread apis/python/src/tiledbsoma/_tdb_handles.py Outdated
Comment thread apis/python/src/tiledbsoma/_tdb_handles.py
Comment thread apis/python/src/tiledbsoma/_tdb_handles.py Outdated
Comment thread apis/python/src/tiledbsoma/_util.py Outdated
@thetorpedodog
Copy link
Copy Markdown
Contributor

(the usual disclaimer that I’m mostly looking at the Python side for style etc. and didn’t look at the C++ code in detail)

I should add that I like the way the open-and-select-the-right-wrapper thing works now. It’s very well structured and understandable.

@nguyenv nguyenv requested a review from thetorpedodog November 9, 2023 01:45
@nguyenv nguyenv force-pushed the viviannguyen/python-read-path branch 2 times, most recently from d5f4dc5 to 8f7951e Compare November 9, 2023 22:29
Copy link
Copy Markdown
Contributor

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing huge to sugggest -- some spelling changes etc

Comment thread libtiledbsoma/test/unit_soma_dense_ndarray.cc
Comment thread libtiledbsoma/test/unit_soma_sparse_ndarray.cc
Comment thread libtiledbsoma/test/unit_soma_dataframe.cc
Comment thread libtiledbsoma/test/unit_soma_array.cc
Comment thread libtiledbsoma/src/soma/soma_sparse_ndarray.h Outdated
Comment thread apis/python/src/tiledbsoma/soma_array.cc
Comment thread apis/python/tests/test_platform_config.py
Comment thread libtiledbsoma/src/soma/soma_array.h Outdated
Comment thread libtiledbsoma/src/soma/soma_array.h Outdated
Comment thread libtiledbsoma/src/soma/soma_dataframe.cc
Vivian Nguyen added 22 commits January 19, 2024 12:59
* Move `PyQueryCondition` Into `common.h`
* Use Pyarrow Schema instead of TileDB ArraySchema
* Remove TileDB-Py dependency
* No longer requires attr-to-enum mapping passed for dictionaries as
  this can be checked in Pyarrow Schema now
* Eventually the `arrow_schema` calls should replace `schema` but quite
  a few things still depend on the TileDB ArraySchema so this is going
  to be temporarily punted for now
* The R API should be refactored soon to not rely on tiledb::ArraySchema
@nguyenv nguyenv force-pushed the viviannguyen/python-read-path branch from 7c4b511 to 03e7a61 Compare January 19, 2024 19:17
@nguyenv nguyenv force-pushed the viviannguyen/python-read-path branch from 03e7a61 to 53cca7e Compare January 19, 2024 19:47
@nguyenv
Copy link
Copy Markdown
Contributor Author

nguyenv commented Jan 19, 2024

@beroy I have now merged your changes into my PR. The only change I did was move your reindexer unit tests to reside within the apis/python/tests directory. This is a part of changes within this PR where I've moved all Python unit tests from libtilesoma/tests into the Python API directory as to not have them split between two directories. Because this PR is already large enough as is, I have decided to leave the reindexer bindings within pytiledbsoma.cc for now but will be opening a subsequent PR where this gets separated out into its own reindexer.cc.

@nguyenv nguyenv merged commit 44bfad4 into main Jan 19, 2024
@nguyenv nguyenv deleted the viviannguyen/python-read-path branch January 19, 2024 21:14
jdblischak added a commit to TileDB-Inc/centralized-tiledb-nightlies that referenced this pull request Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants