UMseqflow

Welcome. UMseqflow is an easy and updated snakemake bioinformatics workflow for simplying the analysis workflow.

structure:

.
├── LICENSE
├── RNA
│   ├── Snakefile_rna_v04
│   ├── configs
│   │   ├── config.yaml
│   │   └── metadata.csv
│   └── rules
│       ├── common.smk # Input and output
│       ├── preprocessing.smk # QC and trim
│       ├── hisat.smk # hisat2+featurecount
│       └── salmon.smk 
│       ├── multiqc.smk # multiqc for all steps
├── readme.md
└── setup.md

Only provide RNA mode currently.

TODO

Major

WES mode
package into docker
Publish to GitHub
reason: Code has changed since last execution, fix

Minor

Revise trim mode
Update Ref files into ali netdisk
trim part qc is slow(run trim, run other part)

Prepare

First prepare analysis environment follow setup.md include conda, ref files.

Set up the configs/config.yaml, configs/metadata.csv

First move or use script from Other useful sript by soft link to ./raw folder:

$ ls raw
HB_1_1.fq.gz  HB_1_2.fq.gz  UH_1_1.fq.gz  UH_1_2.fq.gz

Then run for making metadata.csv :

echo "sample" > metadata.csv
ls raw/*.fq.gz | sed 's/_[12].fq.gz$//' | sed 's|^raw/||' | sort | uniq >> metadata.csv
mv metadata.csv configs/metadata.csv

RNA

You can decide call salmon, hisat2+featurecount or both by leaving the path for these index empty or not.

Run

nohup snakemake -p --cores 42 -s Snakefile_rna_v04 &

snakemake -np -s Snakefile_rna_v04
# dry-mode for test

You can visualize the pipeline through graphviz:

snakemake --dag -s RNA/Snakefile_rna_v04 | dot -Tpdf > workflow.pdf

snakemake --dag -s Snakefile_rna_v04 | dot -Tpng > workflow.png

Other useful sript

soft link all fq into raw folder

find ../../X201SC24128617-Z01-F001/01.RawData -type f -name '*fq.gz' -exec ln -s {} . \;

make a metadata

echo "sample" > config.yaml/metadata.csv && ls SRR* | sed 's/_[12]\.fastq\.gz$//' | sort | uniq >> config.yaml/metadata.csv

rename fq

for file in *.fastq.gz; do mv "$file" "${file/fastq/fq}"; done

Milestones

250215

publish to github. Welcome!

Contribution

You can follow other projects which I also referenced:

Fred-White94/snakemake_rnaseq: A Snakemake pipeline to go from fastq mRNA sequencing files to raw and normalised counts (usable for downstream EDA and differential analysis)

tjbencomo/ngs-pipeline: Pipeline for Somatic Variant Calling with WES and WGS data

zhxiaokang/RASflow: RNA-Seq analysis workflow

基于GATK4标准找变异方法的自动化工作流程oVarFlow的使用-腾讯云开发者社区-腾讯云

toturial: Snakemake for Biostatistics Quick-Start Tutorial

https://carpentries-incubator.github.io/snakemake-novice-bioinformatics/

#snakemake 北野茶缸子

Or you can connect with me: [email protected]

Cursor, copilot, trae are good LLM IDE for coding your smk!

Why I am choosing Smk instead of others

nextflow, galaxy...

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
RNA		RNA
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
readme.md		readme.md
setup.md		setup.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UMseqflow

TODO

Major

Minor

Prepare

RNA

Run

Other useful sript

Milestones

250215

Contribution

Why I am choosing Smk instead of others

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UMseqflow

TODO

Major

Minor

Prepare

RNA

Run

Other useful sript

Milestones

250215

Contribution

Why I am choosing Smk instead of others

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages