peanut calculates alignment metrics of a given GAF file from GraphAligner evaluating the CIGAR string.
It outputs four metrics:
Optionally, it writes the nonaln query regions to BED.
#Eare the number of sequence matches (=orEsymbol) in the GAF file. Nucleotide positions with sequence matches in multiple alignments are only counted once.query_lensis the length of all queries in the GAF in nucleotides.
uniq_#Eare the number of unique sequence matches in the GAF file.query_lensis the length of all queries in the GAF in nucleotides.
multi_#Eare the number of multiple sequence matches in the GAF file. Nucleotide positions with more than one multiple sequence matches are only counted once.query_lensis the length of all queries in the GAF in nucleotides.
nonaln_#Eare the number of non-sequence matches in the GAF file.query_lensis the length of all queries in the GAF in nucleotides.
git clone https://github.com/pangenome/rs-peanut.git
cd rs-peanut
cargo build --release
peanut requires as an input a GAF file -g.
./target/release/peanut -g aln.gaf
The output is written to stdout in a tab-delimited format.
0.992910744238371 0.9926967987671109 0.00021394547126006352 0.007089255761628998
The first number is the qsc, the second number is the uniq, and the third number is the multi, and the fourth number is the nonaln.
- Add query sequence alignment match mismatch (qsamm).
- Describe
qsc. - Remove non-helping metrics
qsammandqsm. - Add 3 new metrics: number of
unique query base alignments, number ofmultiple query base alignments, and number ofnonalnquery bases.
So far, it has not been tested if peanut also works with GAF files not originating from GraphAligner.