1_RNA_preprocess

1. Download all the files to your folder

Your_folder
├── ME_index
├── Snakefile
├── cluster.json
├── sample2json.py
├── scHiCAR_RNA_18bp_barcode.txt.gz
├── fq  # move your raw fastq files to this folder
│   ├── RNA_example_R1_001.fastq.gz
│   └── RNA_example_R2_001.fastq.gz
└── script
    ├── barcode_hash_v2.py
    ├── fq_barcode_correction_R1.py

2.Create samples.json file

python3 sample2json.py --fastq_dir fq

3. Run snakemake pipeline (customize -p as needed based on your HPC environment)

Before running Snakemake, please make sure all required Python packages used in the .py files under the script folder are installed.

snakemake --latency-wait 60 -p -j 99 --cluster-config cluster.json --cluster "sbatch -p common -J {cluster.job} --mem={cluster.mem} -N 1 -n {threads} -o {cluster.out} -e {cluster.err} " &> log &

4. Align reads to the genome and generate a filtered matrix folder that includes the files `barcodes.tsv`, `features.tsv`, and `matrix.mtx`

gunzip -c scHiCAR_RNA_18bp_barcode.txt.gz > scHiCAR_RNA_18bp_barcode.txt

STAR --runMode alignReads \
--genomeDir PATH_TO_STAR_INDEX_folder \
--runThreadN 12 \
--outFileNamePrefix RNA_example \
--outSAMtype BAM SortedByCoordinate \
--outSAMattributes NH HI nM AS CR UR CB UB GX GN sS sQ sM \
--soloType CB_UMI_Simple \
--soloFeatures GeneFull \
--soloCBwhitelist scHiCAR_RNA_18bp_barcode.txt \
--soloCBstart 1 \
--outSAMmapqUnique 255 \
--soloCBlen 18 \
--soloUMIstart 19 \
--soloUMIlen 16 \
--soloCBmatchWLtype Exact \
--soloUMIdedup 1MM_CR \
--soloStrand Forward \
--soloUMIfiltering - \
--readFilesIn 03_corrected_fq/RNA_example_all_L001_R2_001.fastq.gz 03_corrected_fq/RNA_example_all_L001_R1_001.fastq.gz \
--readFilesCommand zcat \
--genomeSAindexNbases 2 \
--soloBarcodeReadLength 0 \
--soloCellFilter EmptyDrops_CR \
--limitBAMsortRAM 200000000000 > log 2>&1 &

The STAR output GeneFull/filtered folder can be used in standard scRNA-seq downstream analysis (such as cell clustering and annotation with Seurat).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

1. Download all the files to your folder

2.Create samples.json file

3. Run snakemake pipeline (customize -p as needed based on your HPC environment)

4. Align reads to the genome and generate a filtered matrix folder that includes the files `barcodes.tsv`, `features.tsv`, and `matrix.mtx`

Name		Name	Last commit message	Last commit date
parent directory ..
fq		fq
script		script
ME_index		ME_index
README.md		README.md
Snakefile		Snakefile
cluster.json		cluster.json
sample2json.py		sample2json.py
scHiCAR_RNA_18bp_barcode.txt.gz		scHiCAR_RNA_18bp_barcode.txt.gz
workflow.svg		workflow.svg

FilesExpand file tree

1_RNA_preprocess

Directory actions

More options

Directory actions

More options

Latest commit

History

1_RNA_preprocess

Folders and files

parent directory

README.md

1. Download all the files to your folder

2.Create samples.json file

3. Run snakemake pipeline (customize -p as needed based on your HPC environment)

4. Align reads to the genome and generate a filtered matrix folder that includes the files barcodes.tsv, features.tsv, and matrix.mtx

4. Align reads to the genome and generate a filtered matrix folder that includes the files `barcodes.tsv`, `features.tsv`, and `matrix.mtx`