Skip to content

[python] Sanitize keys when generating URIs#4031

Merged
nguyenv merged 12 commits intomainfrom
viviannguyen/sc-63402/python-c-security-tiledbsoma-io-from-anndata
May 20, 2025
Merged

[python] Sanitize keys when generating URIs#4031
nguyenv merged 12 commits intomainfrom
viviannguyen/sc-63402/python-c-security-tiledbsoma-io-from-anndata

Conversation

@nguyenv
Copy link
Copy Markdown
Contributor

@nguyenv nguyenv commented May 5, 2025

Issue and/or context:

[sc-63402]

Changes:

Add _util.sanitize_key which encodes unsafe characters when generating URIs.

@nguyenv nguyenv force-pushed the viviannguyen/sc-63402/python-c-security-tiledbsoma-io-from-anndata branch from 25be975 to b6502a2 Compare May 7, 2025 15:06
@nguyenv nguyenv changed the title [WIP] Add testing for valid and invalid names [python] Add SafeURI class to validate or sanitize URIs May 7, 2025
@codecov
Copy link
Copy Markdown

codecov Bot commented May 9, 2025

Codecov Report

Attention: Patch coverage is 95.83333% with 1 line in your changes missing coverage. Please review.

Project coverage is 89.27%. Comparing base (9c10365) to head (8bae59a).
Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #4031       +/-   ##
===========================================
+ Coverage   65.27%   89.27%   +23.99%     
===========================================
  Files         160       59      -101     
  Lines       21320     7076    -14244     
  Branches     1259        0     -1259     
===========================================
- Hits        13917     6317     -7600     
+ Misses       6992      759     -6233     
+ Partials      411        0      -411     
Flag Coverage Δ
libtiledbsoma ?
python 89.27% <95.83%> (+0.30%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 89.27% <95.83%> (+0.30%) ⬆️
libtiledbsoma ∅ <ø> (∅)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nguyenv nguyenv changed the title [python] Add SafeURI class to validate or sanitize URIs [python] Sanitize URIs in Collection.set May 9, 2025
@nguyenv nguyenv changed the title [python] Sanitize URIs in Collection.set [python] Sanitize URIs in Group.set May 9, 2025
@nguyenv nguyenv changed the title [python] Sanitize URIs in Group.set [python] Sanitize keys when generating URIs May 12, 2025
@nguyenv nguyenv marked this pull request as ready for review May 12, 2025 19:49
Copy link
Copy Markdown
Member

@bkmartinjr bkmartinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs HISTORY.md entry as this definitely changes API behavior

@bkmartinjr
Copy link
Copy Markdown
Member

Is there an equivalent code path we need to fix in the R bindings?

Comment thread apis/python/src/tiledbsoma/_util.py Outdated
Comment thread apis/python/src/tiledbsoma/_util.py Outdated
Copy link
Copy Markdown
Member

@bkmartinjr bkmartinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of spots in the spatial ingestion code (io/spatia/ingest.py) which are lacking the sanitize_key call.

Example: line 527 uses the measurement_name string without escaping it

Copy link
Copy Markdown
Member

@bkmartinjr bkmartinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several comments/change requests:

  • the escape algo should not unescape - you need the function to create a unique path for every unique key, and this breaks that constraint
  • io.spatial needs to also escape user-provided keys
  • we need similar changes to R (where applicable). I'm fine if that is in this PR or another, as long as we don't forget about it

@nguyenv nguyenv force-pushed the viviannguyen/sc-63402/python-c-security-tiledbsoma-io-from-anndata branch from fdf6a78 to 2561f1b Compare May 14, 2025 03:49
@nguyenv nguyenv requested a review from bkmartinjr May 14, 2025 17:50
@nguyenv
Copy link
Copy Markdown
Contributor Author

nguyenv commented May 14, 2025

Will do the R changes in a separate PR.

Comment thread apis/python/HISTORY.md Outdated
Copy link
Copy Markdown
Member

@bkmartinjr bkmartinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. My only final request is a better change log entry - I suggested some text, but please feel free to do whatever you feel is appropriate. We just need the change log to summarize major changes like this.

Approving on the assumption you will fix the change log prior to landing this.

Copy link
Copy Markdown
Collaborator

@jp-dark jp-dark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one comment that's not directly related to this PR, but might make sense to address here.

I also noticed missing sanitizers in the following places:

  • io/spatial/ingest.py::575 (generating X_layer_uri)
  • io/spatial/ingest.py::612 (generating tissue_uri)

Assuming a no-op isn't expensive, would it easier to maintain adding the sanitize_key directly to the _util.uri_joinpath function?

Comment thread apis/python/src/tiledbsoma/io/ingest.py
@bkmartinjr
Copy link
Copy Markdown
Member

Assuming a no-op isn't expensive, would it easier to maintain adding the sanitize_key directly to the _util.uri_joinpath function?

I recommend not doing this. They are orthogonal functions, and blending them conflates all uses of URI joining with paths generated from keys.

Just my $0.02 :-)

@jp-dark
Copy link
Copy Markdown
Collaborator

jp-dark commented May 15, 2025

I recommend not doing this. They are orthogonal functions, and blending them conflates all uses of URI joining with paths generated from keys.

That makes sense. Sorry for the back-and-forth @nguyenv

@nguyenv nguyenv force-pushed the viviannguyen/sc-63402/python-c-security-tiledbsoma-io-from-anndata branch from d806ccb to 7b678ae Compare May 15, 2025 17:10
@nguyenv nguyenv requested a review from jp-dark May 15, 2025 17:53
Comment thread apis/python/src/tiledbsoma/io/spatial/ingest.py
Comment thread apis/python/src/tiledbsoma/io/spatial/ingest.py
@jp-dark
Copy link
Copy Markdown
Collaborator

jp-dark commented May 20, 2025

Closes SOMA-96

@nguyenv nguyenv force-pushed the viviannguyen/sc-63402/python-c-security-tiledbsoma-io-from-anndata branch from 7b678ae to 8bae59a Compare May 20, 2025 20:38
@nguyenv
Copy link
Copy Markdown
Contributor Author

nguyenv commented May 20, 2025

Also closes SOMA-153

@nguyenv nguyenv requested a review from jp-dark May 20, 2025 21:17
@nguyenv nguyenv merged commit 6863821 into main May 20, 2025
18 checks passed
@nguyenv nguyenv deleted the viviannguyen/sc-63402/python-c-security-tiledbsoma-io-from-anndata branch May 20, 2025 22:13
@jp-dark
Copy link
Copy Markdown
Collaborator

jp-dark commented Jun 30, 2025

closes SOMA-96

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants