ISC 211
Introduction to
Bioinformatics
Lecture 7 – Scoring Matrix
Dr. Athira B
Asst. Professor, CSE
IIIT Kottayam
Scoring Matrix
A scoring matrix contains set of values for qualifying the set of
one residue being substituted by another in sequence
alignment.
A simple scheme:
A positive value or high score is given for a match
a negative value or low score for a mismatch and gaps.
This assignment is based on the assumption that the
frequencies of mutation are equal for all bases.
Scoring matrix for aligning DNA
sequence
Transitions: substitutions in which a purine is replaced by
another purine (A/G) or pyrimidines is replaced by another
pyrimidines (C/T)
Transversions: substitutions between purines and pyrimidines
(A/G to C/T)
Scoring Matrices for Amino-acid
An amino-acid scoring matrix is a 20 × 20 table such that position
indexed with amino-acids so that position X,Y in the table gives the
score of aligning amino-acid X with amino-acid Y
Identity matrix – Exact matches receive one score and non-exact
matches a different score (1 on the diagonal 0 everywhere else)
Mutation data matrix – a scoring matrix compiled based on
observation of protein mutation rates: some mutations are observed
more often than other (PAM, BLOSUM).
Physical properties matrix – amino acids with similar biophysical
properties receive high score.
Genetic code matrix – amino acids are scored based on similarities in
the coding triple.
PAM Matrix
Point Accepted Mutation matrix- proposed by
Margret Dayhoff (known as the mother of
Bioinformatics)
A Pam matrix is a matrix where each column and
each row represent one of the 20 aminoacids. PAM
matrices are regularly used as substitution matrices
used to score sequence alignment for proteins.
PAM Matrix computation
Step-1: Construct Multiple Sequence Alignment (MSA) for the
dataset (collection of amino acid sequences)
Step-2: Construct a phylogenetic tress from MSA- this will give an
idea of how mutations are happening
PAM Matrix computation
Step-3: For each amino acid residue, compute the frequency of
substations with other residues, Fij
FGA = {count of G → A, A → G} = 3
PAM Matrix computation
Step-4: Compute relative mutability of residue mi
m=
m = = 0.0209
Where ,
=4
=6
= 0.159
PAM Matrix computation
Step-5: Compute mutation probability, Mij
Mij =
MGA = = 0.0156
Where Fij = 3 and = 4
PAM Matrix computation
Step-6: Each non-diagonal entries are calculated
as:
Rij =
RGA= log ( ) = -1.01
Where fi = fG = 10 G divided by 63 total residue=
= 0.1587
PAM Matrix computation
Step-7: For diagonal entries, Rjj = 1- m and repeat
step 6.
What is the probability that a residue of type j will
be replaced by i in M?
The answer can be obtained from Rij of PAM-1
matrix.
Reading Assignment
Understand about BLOSUM
PAM-1, PAM-250, PAM-1000