Anopheles refactor part 2 - AnophelesGenomeSequenceData#381
Conversation
| MAJOR_VERSION_NUMBER = 1 | ||
| MAJOR_VERSION_PATH = "v1.0" |
There was a problem hiding this comment.
Renamed for consistency with other variable names.
There was a problem hiding this comment.
This module provides the new AnophelesGenomeSequenceData class.
| @property | ||
| @abstractmethod | ||
| def contigs(self): | ||
| raise NotImplementedError("Must override contigs") | ||
|
|
||
| @property | ||
| @abstractmethod | ||
| def _genome_fasta_path(self): | ||
| raise NotImplementedError("Must override _genome_fasta_path") | ||
|
|
||
| @property | ||
| @abstractmethod | ||
| def _genome_fai_path(self): | ||
| raise NotImplementedError("Must override _genome_fai_path") | ||
|
|
||
| @property | ||
| @abstractmethod | ||
| def _genome_zarr_path(self): | ||
| raise NotImplementedError("Must override _genome_zarr_path") | ||
|
|
||
| @property | ||
| @abstractmethod | ||
| def _genome_ref_id(self): | ||
| raise NotImplementedError("Must override _genome_ref_id") | ||
|
|
||
| @property | ||
| @abstractmethod | ||
| def _genome_ref_name(self): | ||
| raise NotImplementedError("Must override _genome_ref_name") | ||
|
|
There was a problem hiding this comment.
Refactored to parent class.
| @doc( | ||
| summary="Open the reference genome zarr.", | ||
| returns="Zarr hierarchy containing the reference genome sequence.", | ||
| ) | ||
| def open_genome(self) -> zarr.hierarchy.Group: | ||
| if self._cache_genome is None: | ||
| path = f"{self._base_path}/{self._genome_zarr_path}" | ||
| store = init_zarr_store(fs=self._fs, path=path) | ||
| self._cache_genome = zarr.open_consolidated(store=store) | ||
| return self._cache_genome | ||
|
|
||
| def _genome_sequence_for_contig(self, *, contig, inline_array, chunks): | ||
| """Obtain the genome sequence for a given contig as an array.""" | ||
| assert contig in self.contigs | ||
| genome = self.open_genome() | ||
| z = genome[contig] | ||
| d = da_from_zarr(z, inline_array=inline_array, chunks=chunks) | ||
| return d | ||
|
|
||
| @doc( | ||
| summary="Access the reference genome sequence.", | ||
| returns=""" | ||
| An array of nucleotides giving the reference genome sequence for the | ||
| given contig. | ||
| """, | ||
| ) | ||
| def genome_sequence( | ||
| self, | ||
| region: base_params.region, | ||
| inline_array: base_params.inline_array = base_params.inline_array_default, | ||
| chunks: base_params.chunks = base_params.chunks_default, | ||
| ) -> da.Array: | ||
| resolved_region: Region = self.resolve_region(region) | ||
| del region | ||
|
|
||
| # obtain complete sequence for the requested contig | ||
| d = self._genome_sequence_for_contig( | ||
| contig=resolved_region.contig, inline_array=inline_array, chunks=chunks | ||
| ) | ||
|
|
||
| # deal with region start and stop | ||
| if resolved_region.start: | ||
| slice_start = resolved_region.start - 1 | ||
| else: | ||
| slice_start = None | ||
| if resolved_region.end: | ||
| slice_stop = resolved_region.end | ||
| else: | ||
| slice_stop = None | ||
| loc_region = slice(slice_start, slice_stop) | ||
|
|
||
| return d[loc_region] | ||
|
|
There was a problem hiding this comment.
Refactored to parent class.
| if hasattr(resource, "genome_features"): | ||
| gene_annotation = resource.genome_features(attributes=["ID"]) | ||
| results = gene_annotation.query(f"ID == '{region}'") | ||
| if not results.empty: | ||
| # the region is a feature ID | ||
| feature = results.squeeze() | ||
| return Region(feature.contig, int(feature.start), int(feature.end)) |
There was a problem hiding this comment.
Small modification here to support testing of the AnophelesGenomeSequenceData class.
There was a problem hiding this comment.
This new file defines test fixtures which are common to the base and genome_sequence test modules. Basically, we want a single place to create some test data.
There was a problem hiding this comment.
Lots of refactoring here to use pytest fixtures, and to make use of common fixtures defined in conftest.py.
There was a problem hiding this comment.
Poetry environment updated to include pytest-cases.
Codecov Report
@@ Coverage Diff @@
## master #381 +/- ##
=========================================
Coverage ? 92.02%
=========================================
Files ? 2
Lines ? 188
Branches ? 0
=========================================
Hits ? 173
Misses ? 15
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
|
Updating the poetry environment has bumped pandas from 1.5 to 2.0, which is then generating some errors around datetime dtype conversions when trying to build the "quarter" column in the sample metadata. I've worked around here by calculating quarter directly. |
Looks like the tests will need to be updated too, since they use the same method: |
|
Thanks Lee 🙏 |
Here we continue working towards #366 by factoring out a class
AnophelesGenomeSequenceDatawhich assumes responsibility for providing access to a reference genome sequence.Also here we reorganise the tests to create a new file
conftest.pywhich provides test fixtures common to multiple test modules.Also here we try out adding coverage reports via codecov.