Skip to content

Create functions to access phenotypic data #789

@jonbrenas

Description

@jonbrenas

The phenotypic data from the GAARD project is now available on GCS. I followed the same organisation as what existed for the metadata, haplotypes and genotypes, so the data for the sample set 1237-VO-BJ-DJOGBENOU-VMF00050 can be found at gs://vo_agam_release_master_us_central1/v3.2/phenotypes/all/1237-VO-BJ-DJOGBENOU-VMF00050/phenotypes.csv.

The idea is to create a function to access this data similar to what exists for genotypes or haplotypes (e.g., the function haplotypes which can probably be used as an example). This function would probably have similar parameters, specifically around sample_sets, sample_queries and so on. The definition of cohorts will need to be different from the one used elsewhere as the insecticide used, the dose and the phenotype are likely to be more of a discriminant than time or geography.

For @leehart, all samples are, by definition not is_surveillance and unrestricted_use so these new functions ought not to raise new issues for #716.

I think the main function would return an xarray Dataset (to be able to handle the metadata about samples and variants at the same time) but a function that returns a DataFrame of the metadata about the samples can be a good stepping stone. @mohamed-laarej, feel free to ask questions if things are not quite clear.

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions