Skip to content

[python] add dup obs id check to tiledbsoma.io register functions#4086

Merged
bkmartinjr merged 8 commits intomainfrom
bkm/soma-131-dup-obs-id
Jun 4, 2025
Merged

[python] add dup obs id check to tiledbsoma.io register functions#4086
bkmartinjr merged 8 commits intomainfrom
bkm/soma-131-dup-obs-id

Conversation

@bkmartinjr
Copy link
Copy Markdown
Member

Issue and/or context:

Add check for duplicate obs axis IDs in tiledbsoma.io.register_anndatas and tiledbsoma.io.register_h5ads. Controlled by a parameter (allow_duplicate_obs_ids, default: True). Will raise an error if there are any IDs in the intersection of provided AnnData and existing SOMA.

Fixes SOMA-131

Changes:

  • Add ID check logic to the registration code
  • Add test cases and modify existing tests to account for the new behavior

@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2025

Codecov Report

Attention: Patch coverage is 92.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.38%. Comparing base (afa1ca7) to head (fb6413e).
Report is 2 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #4086       +/-   ##
===========================================
+ Coverage   65.87%   89.38%   +23.51%     
===========================================
  Files         158       59       -99     
  Lines       20982     7084    -13898     
  Branches     1242        0     -1242     
===========================================
- Hits        13821     6332     -7489     
+ Misses       6748      752     -5996     
+ Partials      413        0      -413     
Flag Coverage Δ
libtiledbsoma ?
python 89.38% <92.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 89.38% <92.00%> (-0.01%) ⬇️
libtiledbsoma ∅ <ø> (∅)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bkmartinjr bkmartinjr marked this pull request as ready for review May 27, 2025 15:27
Copy link
Copy Markdown
Collaborator

@jp-dark jp-dark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good once the HISTORY comment is resolved.

Comment thread apis/python/HISTORY.md Outdated
Copy link
Copy Markdown
Member

@aaronwolen aaronwolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I noted one nit.

Before merging could we update the docstrings to include info about this new behavior? Something similar to your error message like:

The registration process will raise an error if any obs IDs (from obs_field_name) are duplicated across the combination of all inputs and the target SOMA Experiment. You can set allow_duplicate_obs_ids=True to bypass this check if you are adding a new Measurement to existing observations.

msg = f"""Duplicate obs IDs found during registration. {len(examples)} obs IDs are not unique across the provided inputs.
Example duplicate obs ID(s): {a_few_examples}.

Please ensure obs IDs are unique across all input files for append operations.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Please ensure obs IDs are unique across all input files for append operations.
Please ensure obs IDs are unique across all inputs for append operations.

Minor tweak since this could be thrown from register_h5ads or register_anndatas.

@bkmartinjr
Copy link
Copy Markdown
Member Author

could we update the docstrings

will do. I'll note that the docstrings for all of these functions are inadequate - you might want to file some backlog for improving them if that becomes a priority.

@bkmartinjr bkmartinjr merged commit e2135d0 into main Jun 4, 2025
24 of 30 checks passed
@bkmartinjr bkmartinjr deleted the bkm/soma-131-dup-obs-id branch June 4, 2025 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants