Conversation
…-data-python into BJH-docu-expansion
|
Looking at some of the functions that return DataFrames (e.g., |
|
I have s small issue with cohorts(). The docs say that it accepts as a parameter |
|
Re: issue #548 |
|
There were some cases where the datasets were described (e.g., |
|
I am done with this first run through. It is possible that new functions need a new doc too but I feel like it is a good time to ask for a bit of a review. |
leehart
left a comment
There was a problem hiding this comment.
Thanks for all this work @jonbrenas !
It will be good to see how this actually manifests in the deployed docs. Did you establish a way to preview it? Then we can do another round of additions and tweaks, at some point.
alimanfoo
left a comment
There was a problem hiding this comment.
Wow, so much work here @jonbrenas!
Added a couple of suggestions to fill in blanks.
Only other suggestion is to consistently use either **foo** or `foo` when referring to column/variable/dimension names. I'd have a small preference for `foo` as it's a bit less distracting when reading the docs via help().
Co-authored-by: Alistair Miles <[email protected]>
Co-authored-by: Alistair Miles <[email protected]>
Co-authored-by: Alistair Miles <[email protected]>
Co-authored-by: Alistair Miles <[email protected]>
Co-authored-by: Alistair Miles <[email protected]>
alimanfoo
left a comment
There was a problem hiding this comment.
Noticed a couple of files where still using **...
malariagen_data/anoph/aim_data.py
Outdated
| A dataset with 2 dimensions: **variants** the number of AIMs sites, and **alleles** which will always be 2, each representing one of the species. It contains 2 coordinates: | ||
| **variant_contig** has **variants** values and contains the chromosome arm of each AIM, and **variant_position** has **variants** values and contains the position of each AIM. It contains 1 data variable: | ||
| **variant_allele** has (**variants**, **allele**) values and contains the discriminating alleles for each AIM. |
malariagen_data/anoph/aim_data.py
Outdated
| A dataset with 4 dimensions: | ||
| **variants** the number of AIMs sites, | ||
| **samples** the number of samples, | ||
| **ploidy** the ploidy (2), | ||
| and **alleles** which will always be 2, each representing one of the species. It contains 3 coordinates: | ||
| **sample_id** has **samples** values and contains the identifier of each sample, | ||
| **variant_contig** has **variants** values and contains the chromosome arm of each AIM, | ||
| and **variant_position** has **variants** values and contains the position of each AIM. It contains 2 data variables: | ||
| **call_genotype** has (**variants**, **samples**, **ploidy**) values and contains both calls for each sample and each AIM, | ||
| **variant_allele** has (**variants**, **allele**) values and contains the discriminating alleles for each AIM. |
malariagen_data/anoph/base.py
Outdated
| returns="""A dataframe of sample sets, one row per sample set. It contains five columns: | ||
| **sample_set** is the name of the sample set, | ||
| **sample_count** is the number of samples the sample set contains, | ||
| **study_id** is the identifier for the study that generated the sample set, | ||
| **study_url** is the URL of the study on the MalariaGEN website, | ||
| **term_of_use_expiry** is the date when the terms of use expire, | ||
| **terms_of_use_url** is the URL of the terms of use, | ||
| **release** is the identifier of the release containing the sample set, | ||
| **unrestricted_use** whether the sample set can be without restriction (e.g., if the terms of use have expired). |
There was a problem hiding this comment.
Yep, it took me 3 files to figure out how to do a find and replace on my Mac ;). I should have rechecked the files where I used another method.
|
Nice one, thanks @jonbrenas 🙏 |
Resolves #662.
Quite a few functions are shown in the documentation as returning a DataFrame or a Dataset. It is not always obvious what the returned structure contains (e.g., how many columns in the DataFrame, how to interpret their contents, ...). I have given it a try for
sample_sets(one of the simplest functions) but I am not quite sure how to test what the result looks like. Ideally, I would have had the return be a String (roughly the current documentation) followed by a dict of the columns but that is not an option supported by numpydoc_decorator (I guess I could ask its creator if that can be changed or if he has a better idea @alimanfoo ).