[python] add dup obs id check to tiledbsoma.io register functions#4086
[python] add dup obs id check to tiledbsoma.io register functions#4086bkmartinjr merged 8 commits intomainfrom
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4086 +/- ##
===========================================
+ Coverage 65.87% 89.38% +23.51%
===========================================
Files 158 59 -99
Lines 20982 7084 -13898
Branches 1242 0 -1242
===========================================
- Hits 13821 6332 -7489
+ Misses 6748 752 -5996
+ Partials 413 0 -413
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
jp-dark
left a comment
There was a problem hiding this comment.
Looks good once the HISTORY comment is resolved.
There was a problem hiding this comment.
Looks great. I noted one nit.
Before merging could we update the docstrings to include info about this new behavior? Something similar to your error message like:
The registration process will raise an error if any
obsIDs (fromobs_field_name) are duplicated across the combination of all inputs and the target SOMA Experiment. You can setallow_duplicate_obs_ids=Trueto bypass this check if you are adding a new Measurement to existing observations.
| msg = f"""Duplicate obs IDs found during registration. {len(examples)} obs IDs are not unique across the provided inputs. | ||
| Example duplicate obs ID(s): {a_few_examples}. | ||
|
|
||
| Please ensure obs IDs are unique across all input files for append operations. |
There was a problem hiding this comment.
| Please ensure obs IDs are unique across all input files for append operations. | |
| Please ensure obs IDs are unique across all inputs for append operations. |
Minor tweak since this could be thrown from register_h5ads or register_anndatas.
will do. I'll note that the docstrings for all of these functions are inadequate - you might want to file some backlog for improving them if that becomes a priority. |
Issue and/or context:
Add check for duplicate obs axis IDs in
tiledbsoma.io.register_anndatasandtiledbsoma.io.register_h5ads. Controlled by a parameter (allow_duplicate_obs_ids, default: True). Will raise an error if there are any IDs in the intersection of provided AnnData and existing SOMA.Fixes SOMA-131
Changes: