Conversation
PR ready Pf8() PR ready Pf8()
|
Thanks @eselimnl, I'll take a look asap :) |
Credentials are supplied automatically via a secret, however this is disabled for PRs coming from a fork. Basically, it would be easier if you recreate this PR from a branch within the malariagen/malariagen-data-python repo, rather than from a fork, then all checks will run properly. I've given you write access to this repo so you should be able to do that now. |
| # S3 access: set up the filesystem. | ||
| if "s3://" in url: | ||
| try: | ||
| import s3fs | ||
| except ImportError: | ||
| raise ImportError("Amazon S3 support requires the 's3fs' package.") | ||
|
|
||
| # Configuration for S3 access, customize as needed. | ||
| config = { | ||
| "signature_version": "s3", | ||
| "s3": {"addressing_style": "virtual"}, | ||
| } | ||
|
|
||
| # Create an S3FileSystem with custom endpoint if specified. | ||
| kwargs["anon"] = kwargs.get("anon", True) # Default to anonymous access. | ||
| kwargs["endpoint_url"] = "https://cog.sanger.ac.uk" | ||
| kwargs["config_kwargs"] = kwargs.get("config_kwargs", config) | ||
|
|
||
| # Use s3fs to create the filesystem. | ||
| fs = s3fs.S3FileSystem(**kwargs) | ||
| path = url.split("s3://")[1] |
There was a problem hiding this comment.
It should be possible to simplify this a little. This is because fsspec should have the ability to detect S3 URLs and use s3fs automatically, rather than requiring you to import and instantiate directly. I.e., url_to_fs(url, **kwargs) should work for S3 just as well as for GCS.
Sorry about this, don't know what's going on there. Seems like a reasonable approach for now, but happy to investigate if there's a problem upstream in scikit-allel. |
|
Many thanks for the review @alimanfoo. You were right that |
|
Closing in favour of #659. |
Hi @alimanfoo, here is the draft PR for Pf8(). I couldn't add my GOOGLE_CREDENTIALS that is needed by the workflow, please advise if you would like me to do this.
Pf8(): The Pf8() function is built upon the
Pf7()template, and should not disrupt the functionality ofPf7()(verified via test_pf7_integration.py).Sanger S3 Configuration: Updates have been made in malariagen_data/util.py to support data access on the Sanger S3 storage. For privacy reasons, the bucket name is not included in the PR (such as in the tests/test_pf8_integration.py).
Variant/altlen Field Handling: We observed that the variant/altlen field contains a single value rather than one per ALT, despite the number of ALT being set to 6. The cause of this behavior is unclear, but it may relate to the comment “special case (for altlen, number depends on ALT)” in scikit-allel. Since it is possible to calculate this value from
REFandALT, we are not concerned about it now. Therefore, this field is excluded fromdata_varsafter reading the Pf8 zarr file.