Skip to content

PICRUSt2-v2.6.0#372

Merged
R-Wright-1 merged 32 commits intomasterfrom
PICRUSt2-v2.6.0
Jan 27, 2025
Merged

PICRUSt2-v2.6.0#372
R-Wright-1 merged 32 commits intomasterfrom
PICRUSt2-v2.6.0

Conversation

@R-Wright-1
Copy link
Contributor

This branch:

  • Includes the updated PICRUSt2-MPGA database that uses GTDB reference genomes and has separate phylogenetic trees and reference files for bacteria and archaea
  • Includes code for placement into both bacterial and archaeal phylogenetic trees, determining which tree is the best fit for each sequence based on NSTI, filtering the tree files and NSTI files to only the sequences that are the best fit for each domain, carrying out the predictions, and combining the files for each domain for metagenome prediction
  • Outputs the closest reference genome for each study sequence

To see a full overview of the new database and the changes made, see the Wiki page.

R-Wright-1 and others added 30 commits December 12, 2024 09:11
Add new reference files - separate files for bacteria and archaea
Previously, there was a check that looked for overlap between all ASVs in the input FASTA and the feature table, but there were downstream errors that could be caused by having duplicated sequence IDs in the input fasta. This fix adds a check that no sequence IDs in the input FASTA appear more than once.
- Zipped default file that was previously unzipped
- Added metacyc reaction mapping pathways (modified from HUMAnN3)
- Added line to castor_hsp.R that ensures no issues running maximum parsimony method even if edge lengths of tree have zeroes
Some new scripts have been added:
- default_split.py: locations of new default files for when we're running bacteria/archaea separately
- split_domains.py: functions for choosing the best domain for each sequence based on which has the lowest NSTI
- pick_best_domain.py: wrapper for picking the best domain to use for each sequence when we're running bacteria/archaea separately. Note that this would be run between hsp.py with the 16S/marker gene file and running hsp.py with any other trait files
- combine_domains.py: wrapper for combining functional predictions from hsp.py for when we're running multiple domains. This would be run before the metagenome_pipeline step
- Functions in util.py have been added for steps like reading in and pruning the tree files
Update to add scripts for running both bacterial and archaeal predictions
Added requirement for ete3 to yaml file
Added check for when no sequences match with one of the domains
Note that this file still needs testing
Archaea reference files now work with SEPP
Added new -db flag to pathway_pipeline.py
@R-Wright-1 R-Wright-1 self-assigned this Jan 27, 2025
@R-Wright-1 R-Wright-1 merged commit 9ed5110 into master Jan 27, 2025
1 check passed
@R-Wright-1 R-Wright-1 deleted the PICRUSt2-v2.6.0 branch January 27, 2025 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant