[python] Sanitize keys when generating URIs#4031
Conversation
25be975 to
b6502a2
Compare
SafeURI class to validate or sanitize URIs
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4031 +/- ##
===========================================
+ Coverage 65.27% 89.27% +23.99%
===========================================
Files 160 59 -101
Lines 21320 7076 -14244
Branches 1259 0 -1259
===========================================
- Hits 13917 6317 -7600
+ Misses 6992 759 -6233
+ Partials 411 0 -411
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
SafeURI class to validate or sanitize URIsCollection.set
Collection.setGroup.set
Group.set
bkmartinjr
left a comment
There was a problem hiding this comment.
Needs HISTORY.md entry as this definitely changes API behavior
|
Is there an equivalent code path we need to fix in the R bindings? |
bkmartinjr
left a comment
There was a problem hiding this comment.
There are a couple of spots in the spatial ingestion code (io/spatia/ingest.py) which are lacking the sanitize_key call.
Example: line 527 uses the measurement_name string without escaping it
bkmartinjr
left a comment
There was a problem hiding this comment.
Several comments/change requests:
- the escape algo should not unescape - you need the function to create a unique path for every unique key, and this breaks that constraint
- io.spatial needs to also escape user-provided keys
- we need similar changes to R (where applicable). I'm fine if that is in this PR or another, as long as we don't forget about it
fdf6a78 to
2561f1b
Compare
|
Will do the R changes in a separate PR. |
bkmartinjr
left a comment
There was a problem hiding this comment.
This looks great. My only final request is a better change log entry - I suggested some text, but please feel free to do whatever you feel is appropriate. We just need the change log to summarize major changes like this.
Approving on the assumption you will fix the change log prior to landing this.
jp-dark
left a comment
There was a problem hiding this comment.
I have one comment that's not directly related to this PR, but might make sense to address here.
I also noticed missing sanitizers in the following places:
- io/spatial/ingest.py::575 (generating
X_layer_uri) - io/spatial/ingest.py::612 (generating
tissue_uri)
Assuming a no-op isn't expensive, would it easier to maintain adding the sanitize_key directly to the _util.uri_joinpath function?
I recommend not doing this. They are orthogonal functions, and blending them conflates all uses of URI joining with paths generated from keys. Just my $0.02 :-) |
That makes sense. Sorry for the back-and-forth @nguyenv |
d806ccb to
7b678ae
Compare
|
Closes SOMA-96 |
7b678ae to
8bae59a
Compare
|
Also closes SOMA-153 |
|
closes SOMA-96 |
Issue and/or context:
[sc-63402]
Changes:
Add
_util.sanitize_keywhich encodes unsafe characters when generating URIs.