Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅ ❗ Your organization needs to install the Codecov GitHub app to enable full functionality. see 110 files with indirect coverage changes 📢 Thoughts on this report? Let us know!. |
thetorpedodog
left a comment
There was a problem hiding this comment.
With the caveat that I have not yet exensively looked at the C++ side of things, here is a pass on the Python side. Sorry for missing this yesterday!
| uri = entry.entry.uri | ||
| wrapper: type[Wrapper[Any | Any | Any]] | ||
| if self.mode == "r" and clib.SOMADataFrame.exists(entry.entry.uri): | ||
| if self.mode == "r" and clib.SOMADataFrame.exists(uri): |
There was a problem hiding this comment.
I probably should have noticed this earlier, but: we probably need to be passing the context we have to this function, right? It might contain user credentials or other important configuration (e.g. endpoint locations)
| Experimental. | ||
| """ | ||
| return cast(Tuple[int, ...], tuple(self._soma_reader().shape)) | ||
| handle: SOMAArray = self._handle |
There was a problem hiding this comment.
This comment is only tangentially related to this line, so:
Given that _handle is now something different for all of the TileDB Array–based types, it’s probably worth it to pull that generic specialization out of TileDBArray and put it into each of the concrete ones (so this becomes a TileDBArray[SOMAArray], or however the wrapper type generic stuff works because I kind of forget, etc.).
This isn’t critical, so if it ends up being a mess I wouldn’t worry about it but if it is reasonably straightforward it’s worth considering.
|
|
||
| def column_to_enumeration(self, name: str) -> str: | ||
| return str(self._soma_reader().get_enum_label_on_attr(name)) | ||
| def enumeration(self, name: str) -> Optional[Tuple[Any, ...]]: |
There was a problem hiding this comment.
To avoid use of Any, it may be better to call this Tuple[object, ...]. Doing so would have the advantage that the caller would need to either check or assert that it gets the returned type it wants, which is safer. But also it has the disadvantage that the caller would need to either check or assert that it gets the returned type it wants, which is kind of annoying.
| f"cannot open {hdl.uri!r}: a {type(hdl._handle)}" | ||
| f" cannot be converted to a {typename}" | ||
| ) | ||
| print(typename, cls, type(hdl)) |
| handle: clib.DataFrameWrapper = self._handle | ||
| return cast(int, handle.count) |
There was a problem hiding this comment.
Could some of these be avoided by making a libtiledbsoma.pyi file? That would also give us some tooling help in editors. Up to you, and even if you do want to, it’s not something that needs to happen in this change.
| to_clib_result_order = { | ||
| options.ResultOrder.AUTO: clib.ResultOrder.automatic, | ||
| options.ResultOrder.ROW_MAJOR: clib.ResultOrder.rowmajor, | ||
| options.ResultOrder.COLUMN_MAJOR: clib.ResultOrder.colmajor, | ||
| "auto": clib.ResultOrder.automatic, | ||
| "row-major": clib.ResultOrder.rowmajor, | ||
| "column-major": clib.ResultOrder.colmajor, | ||
| } | ||
| if result_order not in to_clib_result_order: | ||
| raise ValueError(f"Invalid result_order: {result_order}") |
There was a problem hiding this comment.
to go with the above it looks like this could be pulled into a function.
| {k: str(v) for k, v in context.tiledb_config.items()}, | ||
| [], | ||
| clib.ResultOrder.automatic, | ||
| (0, timestamp), |
There was a problem hiding this comment.
Could you add names to these arguments? The floating [] makes me nervous because I have no idea what it belongs to.
| (0, timestamp), | ||
| ) | ||
| class SOMAArrayWrapper(Wrapper[SOMAArray]): | ||
| """Wrapper for Array-derived SOMAObject classes.""" |
There was a problem hiding this comment.
If you add ARRAY_IMPL: Type[SOMAArray] here…
| @classmethod | ||
| def _opener( | ||
| cls, | ||
| uri: str, | ||
| mode: options.OpenMode, | ||
| context: SOMATileDBContext, | ||
| timestamp: int, | ||
| ) -> clib.SOMADataFrame: | ||
| open_mode = clib.OpenMode.read if mode == "r" else clib.OpenMode.write | ||
| return clib.SOMADataFrame.open( | ||
| uri, | ||
| open_mode, | ||
| {k: str(v) for k, v in context.tiledb_config.items()}, | ||
| [], | ||
| clib.ResultOrder.automatic, | ||
| (0, timestamp), | ||
| ) |
There was a problem hiding this comment.
…and move this to SOMAArrayWrapper, replacing clib.SOMADataFrame with cls.ARRAY_IMPL, and set ARRAY_IMPL = SOMADataFrame here…
| @classmethod | ||
| def _opener( | ||
| cls, | ||
| uri: str, | ||
| mode: options.OpenMode, | ||
| context: SOMATileDBContext, | ||
| timestamp: int, | ||
| ) -> clib.SOMASparseNDArray: | ||
| open_mode = clib.OpenMode.read if mode == "r" else clib.OpenMode.write | ||
| return clib.SOMASparseNDArray.open( | ||
| uri, | ||
| open_mode, | ||
| {k: str(v) for k, v in context.tiledb_config.items()}, | ||
| [], | ||
| clib.ResultOrder.automatic, | ||
| (0, timestamp), | ||
| ) |
There was a problem hiding this comment.
…then we can eliminate all this duplicated code (by similarly setting an ARRAY_IMPL here and below).
There was a problem hiding this comment.
@nguyenv the PR also does a major refactoring on the pybind11 files which is not mentioned in the description! Would that stay as a part of this PR?
There was a problem hiding this comment.
It is in the description for the PR above which is higher prescendent than this PR. I should mark this PR as draft for now as I'm dealing with segfault issues arising from the newly blockchain iterator.
There was a problem hiding this comment.
I will definitely help you with reorganizing the reindexer code if that is your main concern with the Pybind11 refactoring.
d5f4dc5 to
8f7951e
Compare
a6dc282 to
98d069d
Compare
* When opening a `DataFrame` in read-mode, use `DataFrameWrapper` which wraps around `clib.SOMADataFrame`. Otherwise, `DataFrame` should use the already existing write-path with `ArrayWrapper` which wraps around a TileDB-Py Array * Necessary changes to `_dataframe.py` to support the read-path already exist on another branch. That branch will be merged into this PR shortly
* Take care of formatting / typing * Correct datetime domains * Get full nonempty domains for `SOMADataFrame` * Find missing open that needs to use `DataframeWrapper`
* Move `PyQueryCondition` Into `common.h` * Use Pyarrow Schema instead of TileDB ArraySchema * Remove TileDB-Py dependency * No longer requires attr-to-enum mapping passed for dictionaries as this can be checked in Pyarrow Schema now
* Eventually the `arrow_schema` calls should replace `schema` but quite a few things still depend on the TileDB ArraySchema so this is going to be temporarily punted for now
98d069d to
9268d35
Compare
8f7951e to
0934313
Compare
21a16a0 to
6a12ef2
Compare
03e7a61 to
53cca7e
Compare
Issue and/or context:
To be merged on top of #1793.
Changes:
SparseNDArrayin read-mode, useSparseNDArrayWrapperwhich wraps aroundclib.SOMASparseNDArrayDenseNDArrayin read-mode, useDenseNDArrayWrapperwhich wraps aroundclib.SOMADenseNDArraySOMAArraySOMAArraywith the newSOMADataFrame,SOMASparseNDArray, andSOMADenseNDArrayclasses, we can reorganize the C++ install headers so that they no longer install internal headers