Welcome. UMseqflow is an easy and updated snakemake bioinformatics workflow for simplying the analysis workflow.
structure:
.
├── LICENSE
├── RNA
│ ├── Snakefile_rna_v04
│ ├── configs
│ │ ├── config.yaml
│ │ └── metadata.csv
│ └── rules
│ ├── common.smk # Input and output
│ ├── preprocessing.smk # QC and trim
│ ├── hisat.smk # hisat2+featurecount
│ └── salmon.smk
│ ├── multiqc.smk # multiqc for all steps
├── readme.md
└── setup.md
Only provide RNA mode currently.
- WES mode
- package into docker
- Publish to GitHub
- reason: Code has changed since last execution, fix
- Revise trim mode
- Update Ref files into ali netdisk
- trim part qc is slow(run trim, run other part)
First prepare analysis environment follow setup.md include conda, ref files.
Set up the configs/config.yaml, configs/metadata.csv
First move or use script from Other useful sript by soft link to ./raw folder:
$ ls raw
HB_1_1.fq.gz HB_1_2.fq.gz UH_1_1.fq.gz UH_1_2.fq.gz
Then run for making metadata.csv :
echo "sample" > metadata.csv
ls raw/*.fq.gz | sed 's/_[12].fq.gz$//' | sed 's|^raw/||' | sort | uniq >> metadata.csv
mv metadata.csv configs/metadata.csv
You can decide call salmon, hisat2+featurecount or both by leaving the path for these index empty or not.
nohup snakemake -p --cores 42 -s Snakefile_rna_v04 &
snakemake -np -s Snakefile_rna_v04
# dry-mode for test
You can visualize the pipeline through graphviz:
snakemake --dag -s RNA/Snakefile_rna_v04 | dot -Tpdf > workflow.pdf
snakemake --dag -s Snakefile_rna_v04 | dot -Tpng > workflow.png
- soft link all fq into raw folder
find ../../X201SC24128617-Z01-F001/01.RawData -type f -name '*fq.gz' -exec ln -s {} . \;
- make a metadata
echo "sample" > config.yaml/metadata.csv && ls SRR* | sed 's/_[12]\.fastq\.gz$//' | sort | uniq >> config.yaml/metadata.csv
- rename fq
for file in *.fastq.gz; do mv "$file" "${file/fastq/fq}"; done
publish to github. Welcome!
You can follow other projects which I also referenced:
tjbencomo/ngs-pipeline: Pipeline for Somatic Variant Calling with WES and WGS data
zhxiaokang/RASflow: RNA-Seq analysis workflow
基于GATK4标准找变异方法的自动化工作流程oVarFlow的使用-腾讯云开发者社区-腾讯云
toturial: Snakemake for Biostatistics Quick-Start Tutorial
Or you can connect with me: [email protected]
Cursor, copilot, trae are good LLM IDE for coding your smk!
nextflow, galaxy...
