Skip to content

[python] add axis delete to Experiment#4215

Merged
bkmartinjr merged 13 commits intomainfrom
bkm/soma-187
Sep 3, 2025
Merged

[python] add axis delete to Experiment#4215
bkmartinjr merged 13 commits intomainfrom
bkm/soma-187

Conversation

@bkmartinjr
Copy link
Copy Markdown
Member

Add Experiment.var_axis_delete and Experiment.obs_axis_delete and their unit tests.

@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 29, 2025

Codecov Report

❌ Patch coverage is 92.70833% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.71%. Comparing base (8ecd7e3) to head (f856eb5).
⚠️ Report is 72 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #4215       +/-   ##
===========================================
+ Coverage   69.46%   89.71%   +20.25%     
===========================================
  Files         169       61      -108     
  Lines       18972     7284    -11688     
  Branches     1233        0     -1233     
===========================================
- Hits        13179     6535     -6644     
+ Misses       5359      749     -4610     
+ Partials      434        0      -434     
Flag Coverage Δ
libtiledbsoma ?
python 89.71% <92.70%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 89.71% <92.70%> (+0.01%) ⬆️
libtiledbsoma ∅ <ø> (∅)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bkmartinjr bkmartinjr marked this pull request as ready for review August 29, 2025 18:06
@jp-dark
Copy link
Copy Markdown
Collaborator

jp-dark commented Sep 2, 2025

I keep forgetting that we need to explicitly add new Python functions to the Sphinx docs (see doc/source dir). We can add that to a separate PR since I need to do it for all the new delete methods as well.

Comment on lines +257 to +259
if isinstance(obj, (DataFrame, SparseNDArray, GeometryDataFrame, PointCloudDataFrame)) and all(
obj.schema.get_field_index(col) >= 0 for col in join_on
):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The support for deletes in GeometryDataFrame isn't implemented yet.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but I didn't want to duplicate the error checking already implemented in delete_cells.

That said, we can remove this easily and ignore the spatial arrays, at the cost of potential user confusion. My sense is we need to raise a proactive error if the Experiment contains any unsupported sub-array.

What do you recommend?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we raise a NotImplementedError here if there is a GeometryDataFrame. I think it's unlikely anyone is using the GeometryDataFrame in the wild right now, but I don't want to corrupt an Experiment if they are.

Copy link
Copy Markdown
Collaborator

@jp-dark jp-dark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple small comments, but overall this looks good to me.

Comment on lines +84 to +86
# var_id, n_cells
# len 1837

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray comments?



@pytest.mark.spatialdata
def test_experiment_obs_axis_delete_spatial(soma_spatial_experiment, soma_tiledb_context) -> None: # noqa: F811
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a similar test for var_axis_delete

Copy link
Copy Markdown
Member

@aaronwolen aaronwolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great and the docstrings are all very clear/informative. Most of my suggestions are formatting-related.

Comment on lines +121 to +123
This operation affects all Measurements within an Experiment. Deleted observations are specified:
- by `obs` DataFrame soma_joinid: a sequence of soma_joinid or a slice
- or, by `obs` DataFrame value filter / query condition
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This operation affects all Measurements within an Experiment. Deleted observations are specified:
- by `obs` DataFrame soma_joinid: a sequence of soma_joinid or a slice
- or, by `obs` DataFrame value filter / query condition
This operation affects all Measurements within an Experiment. Deleted observations are specified:
- by ``obs`` DataFrame soma_joinid: a sequence of soma_joinid or a slice
- or, by ``obs`` DataFrame value filter / query condition

Inserted space so the list will render properly and fixed inline code formatting.

Comment on lines +125 to +126
Sub-objects affected by this operation:
- exp.obs - delete matching DataFrame elements by soma_joinid
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Sub-objects affected by this operation:
- exp.obs - delete matching DataFrame elements by soma_joinid
Sub-objects affected by this operation:
- exp.obs - delete matching DataFrame elements by soma_joinid

Comment on lines +133 to +134
The following sub-objects are unaffected by this operation:exp.ms[*].var , exp.ms[*].varm[*], exp.ms[*].varp[*]
and any user-defined arrays within the collection.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following sub-objects are unaffected by this operation:exp.ms[*].var , exp.ms[*].varm[*], exp.ms[*].varp[*]
and any user-defined arrays within the collection.
The following sub-objects are unaffected by this operation:
- exp.ms[*].var , exp.ms[*].varm[*], exp.ms[*].varp[*]
- any user-defined arrays within the collection

If any affected sub-objects are of type DenseNDArray (dense TileDB array), the entire operation results in an error.

Either ``coords`` or ``value_filter`` must be provided. When both ``coords`` and ``value_filter`` are provided,
the cells that match both constraints will be removed.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the cells that match both constraints will be removed.
the observations that match both constraints will be removed.

Comment on lines +181 to +183
Deleted features are specified:
- by `var` DataFrame joinid: a sequence of joinid or a slice
- by `var` DataFrame value filter / query condition
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Deleted features are specified:
- by `var` DataFrame joinid: a sequence of joinid or a slice
- by `var` DataFrame value filter / query condition
Deleted features are specified:
- by ``var`` DataFrame joinid: a sequence of joinid or a slice
- by ``var`` DataFrame value filter / query condition

Comment on lines +185 to +186
Sub-objects affected by this operation, given a named Measurement (ms_name):
- exp.ms[*ms_name*].var - delete matching DataFrame elements
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Sub-objects affected by this operation, given a named Measurement (ms_name):
- exp.ms[*ms_name*].var - delete matching DataFrame elements
Sub-objects affected by this operation, given a named Measurement (ms_name):
- exp.ms[*ms_name*].var - delete matching DataFrame elements

- exp.ms[*ms_name*].varp[*] - delete matching SparseNDMatrix values, joining on the soma_dim_0 and soma_dim_1 dimensions
- exp.spatial[*].varl[*ms_name*][*] - delete matching PointCloudDataFrame/GeometryDataFrame elements, joining on the soma_joinid dimension

The following Measurement sub-objects are unaffected by this operation: exp.ms[*ms_name*].obsm[*] , exp.ms[*ms_name*].obsp[*] ,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following Measurement sub-objects are unaffected by this operation: exp.ms[*ms_name*].obsm[*] , exp.ms[*ms_name*].obsp[*] ,
The following Measurement sub-objects are unaffected by this operation:
- exp.ms[*ms_name*].obsm[*] and exp.ms[*ms_name*].obsp[*]

If any affected sub-objects are a DenseNDArray (dense TileDB array), the entire operation results in an error.

Either ``coords`` or ``value_filter`` must be provided. When both ``coords`` and ``value_filter`` are provided,
the cells that match both constraints will be removed.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the cells that match both constraints will be removed.
the variables that match both constraints will be removed.

>>> with tiledbsoma.Experiment.open(experiment_uri, mode="d") as exp:
... exp.var_axis_delete(value_filter="feature_biotype == 'gene'")

Note: Deleting cells does not change the size of the current domain or possible enumeration values.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note: Deleting cells does not change the size of the current domain or possible enumeration values.
Note: Deleting variables does not change the size of the current domain or possible enumeration values.

Comment on lines +243 to +244
"""Append the candidate array to the final selection list if:
- not a dense array
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Append the candidate array to the final selection list if:
- not a dense array
"""Append the candidate array to the final selection list if:
- not a dense array

@bkmartinjr
Copy link
Copy Markdown
Member Author

I believe I have addressed all of the issues:

  1. formatting & copy editing in docstrings
  2. added missing unit test
  3. added silent filtering of any joinids outside the domain/shape of the target array

Copy link
Copy Markdown
Member

@aaronwolen aaronwolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you!

@bkmartinjr bkmartinjr merged commit 5550c6f into main Sep 3, 2025
17 checks passed
@bkmartinjr bkmartinjr deleted the bkm/soma-187 branch September 3, 2025 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants