Skip to content

mugpeng/UMseqflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UMseqflow

Welcome. UMseqflow is an easy and updated snakemake bioinformatics workflow for simplying the analysis workflow.

structure:

.
├── LICENSE
├── RNA
│   ├── Snakefile_rna_v04
│   ├── configs
│   │   ├── config.yaml
│   │   └── metadata.csv
│   └── rules
│       ├── common.smk # Input and output
│       ├── preprocessing.smk # QC and trim
│       ├── hisat.smk # hisat2+featurecount
│       └── salmon.smk 
│       ├── multiqc.smk # multiqc for all steps
├── readme.md
└── setup.md

Only provide RNA mode currently.

TODO

Major

  • WES mode
  • package into docker
  • Publish to GitHub
  • reason: Code has changed since last execution, fix

Minor

  • Revise trim mode
  • Update Ref files into ali netdisk
  • trim part qc is slow(run trim, run other part)

Prepare

First prepare analysis environment follow setup.md include conda, ref files.

Set up the configs/config.yaml, configs/metadata.csv

First move or use script from Other useful sript by soft link to ./raw folder:

$ ls raw
HB_1_1.fq.gz  HB_1_2.fq.gz  UH_1_1.fq.gz  UH_1_2.fq.gz

Then run for making metadata.csv :

echo "sample" > metadata.csv
ls raw/*.fq.gz | sed 's/_[12].fq.gz$//' | sed 's|^raw/||' | sort | uniq >> metadata.csv
mv metadata.csv configs/metadata.csv

RNA

You can decide call salmon, hisat2+featurecount or both by leaving the path for these index empty or not.

Run

nohup snakemake -p --cores 42 -s Snakefile_rna_v04 &

snakemake -np -s Snakefile_rna_v04
# dry-mode for test

You can visualize the pipeline through graphviz:

snakemake --dag -s RNA/Snakefile_rna_v04 | dot -Tpdf > workflow.pdf

snakemake --dag -s Snakefile_rna_v04 | dot -Tpng > workflow.png

Other useful sript

  • soft link all fq into raw folder
find ../../X201SC24128617-Z01-F001/01.RawData -type f -name '*fq.gz' -exec ln -s {} . \;
  • make a metadata
echo "sample" > config.yaml/metadata.csv && ls SRR* | sed 's/_[12]\.fastq\.gz$//' | sort | uniq >> config.yaml/metadata.csv
  • rename fq
for file in *.fastq.gz; do mv "$file" "${file/fastq/fq}"; done

Milestones

250215

publish to github. Welcome!

Contribution

You can follow other projects which I also referenced:

Fred-White94/snakemake_rnaseq: A Snakemake pipeline to go from fastq mRNA sequencing files to raw and normalised counts (usable for downstream EDA and differential analysis)

tjbencomo/ngs-pipeline: Pipeline for Somatic Variant Calling with WES and WGS data

zhxiaokang/RASflow: RNA-Seq analysis workflow

基于GATK4标准找变异方法的自动化工作流程oVarFlow的使用-腾讯云开发者社区-腾讯云

toturial: Snakemake for Biostatistics Quick-Start Tutorial

https://carpentries-incubator.github.io/snakemake-novice-bioinformatics/

#snakemake 北野茶缸子

Or you can connect with me: [email protected]

Cursor, copilot, trae are good LLM IDE for coding your smk!

Why I am choosing Smk instead of others

nextflow, galaxy...

About

Simple way to automate your sequence manipulation by SNAKEMAKE

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages