HuMSub Catalogue

Subspecies of the human gut microbiota carry implicit information for in-depth microbiome research

This repository provides all data and code needed to reproduce the results from our publication.

Repository Structure

The code is organized into:

Workflows (`/Workflows`)

Snakemake pipelines used in the study:

Quality-filtering – MAG filtering using GUNC and BUSCO
sourmash-compare – Subspecies delineation
use_catalog – Subspecies quantification from metagenomes
assign-subspecies – Assigning new genomes to HuMSub clusters
get_specific_sequences – Identifying subspecies-specific genes

Scripts (`/Scripts`)

Jupyter notebooks for:

Statistical and machine learning analyses
Figure generation

Notebooks are grouped by topic, matching the publication’s structure. Each includes all necessary data for re-execution.

HuMSub Catalogue Data

The catalogue is available in two prebuilt sourmash SBT index formats:

File	Use Case
`HuMSub_51_1000.sbt.zip`	General subspecies quantification (k=51)
`HuMSub_21_1000.sbt.zip`	Mastiff database queries (k=21)

Download from Zenodo:
https://zenodo.org/records/15862096

Simulated Metagenomic Data

For benchmarking and testing:

File	Description
`humgut_samples.tar.gz`	Simulated paired-end reads from HumGut genomes
`new_samples.tar.gz`	Simulated paired-end reads from genomes outside of HumGut

Corresponding taxonomic distributions are available in Scripts/benchmark/.

Disclaimer

The HuMSub catalogue includes genomes from some non-gut-associated phyla (e.g., Elusimicrobiota, Eremiobacteriota, Patescibacteria) retained from the original HumGut reference.
Although these were not detected in CRC datasets, they were preserved for completeness based on genome quality scores. Use with caution in downstream interpretation.

Getting Started

Install Snakemake (https://snakemake.readthedocs.io) version 7
Clone this repository
Modify the appropriate config.yaml for each workflow
Run with:

snakemake -s workflow/Snakefile --configfile config/config.yaml --use-conda --cores 4

External Resources

This directory lists essential external files required to reproduce the results of the HuMSub study. These resources are hosted externally due to their size and licensing constraints.

1. HumGut Genome Metadata

File: All_genomes.tsv in Scripts/benchmark
Description: Metadata and download links for all genomes in the HumGut catalog

2. HuMSub Catalogue and Benchmark Data

Zenodo Record:
https://zenodo.org/records/15862096

This includes:

HuMSub_51_1000.sbt.zip – k=51, scaled=1000 (subspecies quantification)
HuMSub_21_1000.sbt.zip – k=21, scaled=1000 (Mastiff queries)
humgut_samples.tar.gz – simulated reads from HumGut genomes
new_samples.tar.gz – simulated reads from genomes outside HumGut

After download, place files into the appropriate subdirectories (e.g. resources/, test_data/).

Note

Some files may be automatically downloaded by Snakemake rules if not found in the expected locations. Refer to the main README.md for pipeline instructions.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Scripts		Scripts
Workflows		Workflows
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HuMSub Catalogue

Repository Structure

Workflows (`/Workflows`)

Scripts (`/Scripts`)

HuMSub Catalogue Data

Simulated Metagenomic Data

Disclaimer

Getting Started

External Resources

1. HumGut Genome Metadata

2. HuMSub Catalogue and Benchmark Data

Note

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

trajkovski-lab/humsub

Folders and files

Latest commit

History

Repository files navigation

HuMSub Catalogue

Repository Structure

Workflows (/Workflows)

Scripts (/Scripts)

HuMSub Catalogue Data

Simulated Metagenomic Data

Disclaimer

Getting Started

External Resources

1. HumGut Genome Metadata

2. HuMSub Catalogue and Benchmark Data

Note

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Workflows (`/Workflows`)

Scripts (`/Scripts`)

Packages