Skip to content

docs: expand documentation for sample_metadata()#933

Open
Tanisha127 wants to merge 16 commits intomalariagen:masterfrom
Tanisha127:improve-sample-metadata-docs
Open

docs: expand documentation for sample_metadata()#933
Tanisha127 wants to merge 16 commits intomalariagen:masterfrom
Tanisha127:improve-sample-metadata-docs

Conversation

@Tanisha127
Copy link
Copy Markdown
Contributor

This PR expands the documentation for sample_metadata() to provide:

  • A clearer summary of how metadata is assembled
  • Detailed parameter descriptions
  • An expanded description of the returned DataFrame
  • Explanation of merged metadata sources (general metadata, QC metadata, surveillance flags, AIM, cohorts)

Fixes #553.

@Tanisha127
Copy link
Copy Markdown
Contributor Author

Hi @jonbrenas and @ahernank,

I’ve opened PR #933 to expand the documentation for sample_metadata().
I’d appreciate your feedback when you have time.
Also, the workflows are awaiting maintainer approval for CI to run.

Thank you!

@Tanisha127
Copy link
Copy Markdown
Contributor Author

Hi @jonbrenas ,
All checks are now passing for this PR. Could you please review it when convenient?
Happy to make any changes if needed.
Thank you!

Copy link
Copy Markdown
Collaborator

@jonbrenas jonbrenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "summary" and "returns" are good, the "parameters" are read from base_params and are thus already described elsewhere, so they are not needed here.

@Tanisha127 Tanisha127 requested a review from jonbrenas March 8, 2026 05:03
@Tanisha127
Copy link
Copy Markdown
Contributor Author

Hi @jonbrenas, I've removed the parameters section as requested — only summary and returns remain. Could you please take another look when convenient? Thank you!

@jonbrenas
Copy link
Copy Markdown
Collaborator

Thanks @Tanisha127. When I run len(ag3.sample_metadata(sample_sets='3.0').columns), I get 58. How did you choose which columns to list? Why not all of them?

@Tanisha127
Copy link
Copy Markdown
Contributor Author

Thanks @jonbrenas! I chose those columns because they seemed like the most commonly used ones for a typical analysis — sample identity, collection metadata, QC, and species assignment. However, I can see that only listing 18 out of 58 columns could be misleading or incomplete for users who need the full picture.
I can update the returns section to either:
List all 58 columns with descriptions
Group them by category (e.g. general metadata, QC, AIM, cohorts) with a note that the exact columns vary by sample set
Which approach would you prefer?

@jonbrenas
Copy link
Copy Markdown
Collaborator

Hi @Tanisha127. I thinking grouping columns by category would be very convenient.

@Tanisha127
Copy link
Copy Markdown
Contributor Author

Hi @jonbrenas, I've updated the returns section to group all columns by category. All checks are now passing. Could you please take another look when convenient? Thank you!

@jonbrenas
Copy link
Copy Markdown
Collaborator

Thanks @Tanisha127. Why did you choose to make 'Surveillance flags' its own category? Are there cases (I am thinking about the AIMs columns, for example) where it is possible to be more precise on whether a column will be present or not?

@Tanisha127
Copy link
Copy Markdown
Contributor Author

Hi , Could you please take another look when convenient? Thank you!

@jonbrenas
Copy link
Copy Markdown
Collaborator

Thanks @Tanisha127, it is great! Is there a reason why 'quarter' is the only one with the details of what the value is when the data is missing?

@Tanisha127
Copy link
Copy Markdown
Contributor Author

Hi @jonbrenas, I've removed the missing value detail from the quarter field to keep the documentation consistent with all other columns. I've also resolved the merge conflict with master. Could you please take another look when convenient? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider adding more documentation for sample set metadata

2 participants