-
Notifications
You must be signed in to change notification settings - Fork 179
Create functions to access phenotypic data #789
Description
The phenotypic data from the GAARD project is now available on GCS. I followed the same organisation as what existed for the metadata, haplotypes and genotypes, so the data for the sample set 1237-VO-BJ-DJOGBENOU-VMF00050 can be found at gs://vo_agam_release_master_us_central1/v3.2/phenotypes/all/1237-VO-BJ-DJOGBENOU-VMF00050/phenotypes.csv.
The idea is to create a function to access this data similar to what exists for genotypes or haplotypes (e.g., the function haplotypes which can probably be used as an example). This function would probably have similar parameters, specifically around sample_sets, sample_queries and so on. The definition of cohorts will need to be different from the one used elsewhere as the insecticide used, the dose and the phenotype are likely to be more of a discriminant than time or geography.
For @leehart, all samples are, by definition not is_surveillance and unrestricted_use so these new functions ought not to raise new issues for #716.
I think the main function would return an xarray Dataset (to be able to handle the metadata about samples and variants at the same time) but a function that returns a DataFrame of the metadata about the samples can be a good stepping stone. @mohamed-laarej, feel free to ask questions if things are not quite clear.