Conversation
PR ready Pf8() PR ready Pf8()
|
Just FYI, tests are failing here as the github-actions service account doesn't have permissions for the pf8 bucket. |
|
That's extremely helpful, thanks @ahernank |
malariagen_data/util.py
Outdated
| # Create an S3FileSystem with custom endpoint if specified. | ||
| kwargs["anon"] = kwargs.get("anon", True) # Default to anonymous access. | ||
| kwargs["endpoint_url"] = "https://cog.sanger.ac.uk" | ||
| kwargs["config_kwargs"] = kwargs.get("config_kwargs", config) |
There was a problem hiding this comment.
| kwargs["config_kwargs"] = kwargs.get("config_kwargs", config) | |
| kwargs.setdefault("config_kwargs", config) |
malariagen_data/util.py
Outdated
| } | ||
|
|
||
| # Create an S3FileSystem with custom endpoint if specified. | ||
| kwargs["anon"] = kwargs.get("anon", True) # Default to anonymous access. |
There was a problem hiding this comment.
| kwargs["anon"] = kwargs.get("anon", True) # Default to anonymous access. | |
| kwargs.setdefault("anon", True) # Default to anonymous access. |
malariagen_data/util.py
Outdated
|
|
||
| # Create an S3FileSystem with custom endpoint if specified. | ||
| kwargs["anon"] = kwargs.get("anon", True) # Default to anonymous access. | ||
| kwargs["endpoint_url"] = "https://cog.sanger.ac.uk" |
There was a problem hiding this comment.
| kwargs["endpoint_url"] = "https://cog.sanger.ac.uk" | |
| kwargs.setdwfault("endpoint_url", "https://cog.sanger.ac.uk") |
malariagen_data/plasmodium.py
Outdated
| data_vars = self._add_extended_data(root, inline_array, chunks, data_vars) | ||
| data_vars.pop( | ||
| "variant_altlen", None | ||
| ) # variant_altlen is under investigation in Pf8.zarr |
There was a problem hiding this comment.
Perhaps it would be possible to handle this by removing the variable from the Pf8 config file, rather than here? In which case no changes to tests for pf7 or pv4 would be needed?
There was a problem hiding this comment.
Thanks, changed accordingly.
Thanks @alimanfoo, I have incorporated your suggestions. As you expected, I receive botocore.exceptions.NoCredentialsError when I tried |
|
Sorry @alimanfoo, debugging the chained URL issue took a while, but |
My apologies, I should've explained a bit better how chained URLs are handled in fsspec, at least as I understand it. In looking at this I've also realised the way this is done for GCS could be clearer. I've taken the liberty to push a commit here which consistifies handling of storage options for both GCS and S3 including the the case where URLs are chained. Would you mind taking a look and testing to see if this works for your S3 bucket? |
|
Hold that thought, looks like I've broken some tests, investigating... |
|
Bug fixed, ready for testing with S3. |
|
Many thanks for sorting this out, works perfectly fine with S3 in my testing environment. |
Cool, thanks @eselimnl. Happy for this to be merged? |
|
Yes, I am happy for this to be merged |
|
Thanks @eselimnl, merged! 🎉 Would you like us to make a release? |
|
Exciting! Yes please! |
|
Just to note here that the S3 and GCS buckets for Pf8 are not currently "publicly" accessible (i.e. not public/not part of the authentication process). |
|
I noticed an oversight in the testing environment: S3 access requires the |
This PR adds a new
Pf8API class for accessing the Pf8 data resource.Pf8(): The Pf8() class is built upon the
Pf7()template, and should not disrupt the functionality ofPf7()(verified via test_pf7_integration.py).Sanger S3 Configuration: Updates have been made in malariagen_data/util.py to support data access on the Sanger S3 storage. For privacy reasons, the bucket name is not included in the PR (such as in the tests/test_pf8_integration.py).
Variant/altlen Field Handling: We observed that the variant/altlen field contains a single value rather than one per ALT, despite the number of ALT being set to 6. The cause of this behavior is unclear, but it may relate to the comment “special case (for altlen, number depends on ALT)” in scikit-allel. Since it is possible to calculate this value from
REFandALT, we are not concerned about it now. Therefore, this field is excluded fromdata_varsafter reading the Pf8 zarr file.This PR was previously created from a fork, so comments made there will also be followed here.