Skip to content

Expansion of the documentation#592

Merged
alimanfoo merged 39 commits intomasterfrom
BJH-docu-expansion
Dec 4, 2024
Merged

Expansion of the documentation#592
alimanfoo merged 39 commits intomasterfrom
BJH-docu-expansion

Conversation

@jonbrenas
Copy link
Copy Markdown
Collaborator

@jonbrenas jonbrenas commented Sep 2, 2024

Resolves #662.

Quite a few functions are shown in the documentation as returning a DataFrame or a Dataset. It is not always obvious what the returned structure contains (e.g., how many columns in the DataFrame, how to interpret their contents, ...). I have given it a try for sample_sets (one of the simplest functions) but I am not quite sure how to test what the result looks like. Ideally, I would have had the return be a String (roughly the current documentation) followed by a dict of the columns but that is not an option supported by numpydoc_decorator (I guess I could ask its creator if that can be changed or if he has a better idea @alimanfoo ).

@jonbrenas
Copy link
Copy Markdown
Collaborator Author

Looking at some of the functions that return DataFrames (e.g., sample_metadata()), the columns are dependent on whether it is the Ag3 or Af1 version that is run. They have different pages in the documentation but the same code definition and thus the same @doc text.

@leehart leehart requested a review from alimanfoo September 16, 2024 10:32
@leehart leehart marked this pull request as draft September 16, 2024 14:06
@jonbrenas
Copy link
Copy Markdown
Collaborator Author

I have s small issue with cohorts(). The docs say that it accepts as a parameter str | Mapping[str, str](because it is the standard type of a cohort) but it really only accepts the 6 pre-defined cohorts specifications.

@leehart
Copy link
Copy Markdown
Collaborator

leehart commented Nov 12, 2024

Re: issue #548

@jonbrenas
Copy link
Copy Markdown
Collaborator Author

There were some cases where the datasets were described (e.g., snp_allele_frequencies_advanced). I left them as is. They are likely to be stylistically different but I doubt I have been consistent through the last few months so there will probably be some work needed during review to harmonise everything.

@jonbrenas
Copy link
Copy Markdown
Collaborator Author

I am done with this first run through. It is possible that new functions need a new doc too but I feel like it is a good time to ask for a bit of a review.

@jonbrenas jonbrenas marked this pull request as ready for review November 22, 2024 18:36
@leehart leehart requested review from leehart and removed request for alimanfoo November 26, 2024 14:20
Copy link
Copy Markdown
Collaborator

@leehart leehart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all this work @jonbrenas !
It will be good to see how this actually manifests in the deployed docs. Did you establish a way to preview it? Then we can do another round of additions and tweaks, at some point.

Copy link
Copy Markdown
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, so much work here @jonbrenas!

Added a couple of suggestions to fill in blanks.

Only other suggestion is to consistently use either **foo** or `foo` when referring to column/variable/dimension names. I'd have a small preference for `foo` as it's a bit less distracting when reading the docs via help().

@alimanfoo alimanfoo added the BMGF-068808 Work supported by BMGF grant INV-068808 (MalariaGEN 2024-2027). label Dec 4, 2024
Copy link
Copy Markdown
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed a couple of files where still using **...

Comment on lines +69 to +71
A dataset with 2 dimensions: **variants** the number of AIMs sites, and **alleles** which will always be 2, each representing one of the species. It contains 2 coordinates:
**variant_contig** has **variants** values and contains the chromosome arm of each AIM, and **variant_position** has **variants** values and contains the position of each AIM. It contains 1 data variable:
**variant_allele** has (**variants**, **allele**) values and contains the discriminating alleles for each AIM.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swap ** for `.

Comment on lines +120 to +129
A dataset with 4 dimensions:
**variants** the number of AIMs sites,
**samples** the number of samples,
**ploidy** the ploidy (2),
and **alleles** which will always be 2, each representing one of the species. It contains 3 coordinates:
**sample_id** has **samples** values and contains the identifier of each sample,
**variant_contig** has **variants** values and contains the chromosome arm of each AIM,
and **variant_position** has **variants** values and contains the position of each AIM. It contains 2 data variables:
**call_genotype** has (**variants**, **samples**, **ploidy**) values and contains both calls for each sample and each AIM,
**variant_allele** has (**variants**, **allele**) values and contains the discriminating alleles for each AIM.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swap ** for `.

Comment on lines +400 to +408
returns="""A dataframe of sample sets, one row per sample set. It contains five columns:
**sample_set** is the name of the sample set,
**sample_count** is the number of samples the sample set contains,
**study_id** is the identifier for the study that generated the sample set,
**study_url** is the URL of the study on the MalariaGEN website,
**term_of_use_expiry** is the date when the terms of use expire,
**terms_of_use_url** is the URL of the terms of use,
**release** is the identifier of the release containing the sample set,
**unrestricted_use** whether the sample set can be without restriction (e.g., if the terms of use have expired).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swap ** for `.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it took me 3 files to figure out how to do a find and replace on my Mac ;). I should have rechecked the files where I used another method.

@alimanfoo alimanfoo merged commit e213d1b into master Dec 4, 2024
@alimanfoo alimanfoo deleted the BJH-docu-expansion branch December 4, 2024 22:27
@alimanfoo
Copy link
Copy Markdown
Member

Nice one, thanks @jonbrenas 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BMGF-068808 Work supported by BMGF grant INV-068808 (MalariaGEN 2024-2027).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding detailed descriptions of the DataFrames and Datasets returned by the various functions

3 participants