Anopheles refactor part 5 - AnophelesSnpData#393
Anopheles refactor part 5 - AnophelesSnpData#393alimanfoo merged 59 commits intomalariagen:masterfrom
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #393 +/- ##
==========================================
+ Coverage 95.79% 96.04% +0.25%
==========================================
Files 4 5 +1
Lines 689 1238 +549
==========================================
+ Hits 660 1189 +529
- Misses 29 49 +20 ☔ View full report in Codecov by Sentry. |
alimanfoo
left a comment
There was a problem hiding this comment.
A few notes on the changes in this PR...
| # any remote data in order to fail fast where possible. | ||
| - name: Run fast unit tests | ||
| run: poetry run pytest -v tests/anoph | ||
| run: poetry run pytest -v tests/anoph --typeguard-packages=malariagen_data,malariagen_data.anoph |
There was a problem hiding this comment.
The --typeguard-packages option triggers the pytest typeguard extension to inject runtime type checking everywhere that type annotations have been used. This helps to catch typing errors which are not picked up statically by mypy.
Note, however, that this is something separate from the @check_types decorator which performs parameter type checks for public methods.
| parse_multi_region, | ||
| parse_single_region, |
There was a problem hiding this comment.
The functions for parsing region parameter have been reworked to make sure it's clear when a function will accept multiple regions in a list, versus when only a single region is accepted.
| self, | ||
| sample, | ||
| region, | ||
| region: base_params.single_region, |
There was a problem hiding this comment.
Example of where we use an explicit parameter type to make clear that parameter value must be a single region.
| sample_indices: TypeAlias = Annotated[ | ||
| List[int], | ||
| """ | ||
| Advanced usage parameter. A list of indices of samples to select, | ||
| corresponding to the order in which the samples are found within the | ||
| sample metadata. Either provide this parameter or sample_query, not | ||
| both. | ||
| """, | ||
| ] |
There was a problem hiding this comment.
This parameter has been introduced to make typing easier when handling the case where samples are being selected via a list of indices rather than a pandas query.
| else: | ||
| raise TypeError( | ||
| f"Invalid type for sample_sets parameter; expected str, list or tuple; found: {sample_sets!r}" | ||
| ) | ||
|
|
There was a problem hiding this comment.
Handling type errors isn't required any more, because the @check_types decorator ensures that function parameters will conform to the type annotations.
| return details | ||
|
|
||
|
|
||
| def check_types(f): |
There was a problem hiding this comment.
N.B., we roll our own simple decorator here for performing runtime type checks on parameter values. See the docstring for explanation of why we don't use typeguard's @typechecked decorator.
There was a problem hiding this comment.
All test functions have been migrated to use simulated data where possible, to speed up the test suite.
There was a problem hiding this comment.
All test functions have been migrated to use simulated data where possible, to speed up the test suite.
There was a problem hiding this comment.
All test functions have been migrated to use simulated data where possible, to speed up the test suite.
|
Going to merge if CI passes, but happy to follow up any comments. Need to cut a release as the xarray concatenation problem is causing big slowdowns for trying to run large PCAs, we need a release with the fix in #395. |
Doh, actually forgot we already had a release with #395 in it. Ah well, probably good to get these changes in sooner. |
Here we continue working towards #366 by pulling out functions for accessing SNP data into a new
AnophelesSnpDatacomposable class.While we are here, we also introduce the use of the
typeguardpackage. Acheck_typesdecorator is added to public methods to provide runtime checking of parameter types, which then removes the need for us to write any type checking code. This resolves #358 (preferring typeguard over pydantic because the type checking behaviour is simpler and there is no coersion of types).Along the way, this PR finds and fixes a number of typing errors.
Also resolves #396 by upgrading scikit-allel.