Mass Spectrometric Peptide
Identification Using
MASCOT
Dr. David Wishart
University of Alberta, Edmonton, Canada
[Link]@[Link]
MS Proteomics Applications
• Protein identification/confirmation
• Protein sample purity determination
• Detection of post-translational modifications
• Detection of amino acid substitutions
• Determination of disulfide bonds (# & status)
• De novo peptide sequencing
• Monitoring protein folding (H/D exchange)
• Monitoring protein-ligand complexes/struct.
• 3D Structure determination
Protein Identification
• 2D-GE + MALDI-MS
– Peptide Mass Fingerprinting (PMF)
• 2D-GE + MS-MS
– MS Peptide Sequencing/Fragment Ion Searching
• Multidimensional LC + MS-MS
– ICAT Methods (isotope labelling)
– MudPIT (Multidimensional Protein Ident. Tech.)
• 1D-GE + LC + MS-MS
• De Novo Peptide Sequencing
All require computers to process & analyze data
What is MASCOT?
• A (very) popular web-based tool from
Matrix Science ([Link]) for
performing rapid, accurate, on-line MS
analysis of peptides and proteins
• Supports 3 kinds of analyses
– Peptide Mass Fingerprinting (PMF)
– Sequence (tag) querying
– MS/MS Ion searches
Matrix Science Website
click
Mascot Home Page
[Link]
Why Mascot?
• Among the first to offer free web-based
services for both PMF and MS/MS
• First to use probability-based scoring
(PBS) or “Expect” values to rank matches
and hits (significant improvement over all
other scoring methods)
• Easy-to-use interface, fast, reliable, up-to-
date databases, accurate – a common
industry standard
Two Mascot Choices
• Matrix Science offers two choices for
users:
• #1) A free, open access web-based
system for occasional (1-10) queries
per day (this is what we’ll use)
• #2) A locally installed version for
heavy use or high throughput MS and
MS/MS labs (100’s of queries/day)
Local Mascot Server
• License cost is ~$4000 per CPU
• Single or dual processor Pentium 4,
Xeon, Athlon, Opteron chips (300 MHz
takes 200s/search, 3 GHz takes 20s)
• 2 Gbytes of RAM (key to performance)
• 120 Gbytes of Hard Disk (IDE) space
to store all desired databases
• Can run on Windows or Linux (same)
Local Mascot
• Allows you to customize your databases and
to customize the frequency of database
uploads
• Mascot Distiller – generates peak lists from
just about any instrument (converts
everything to a Mascot Generic File “GMF”)
• Mascot Daemon – allows you to do batch
searches “press submit and go home” also
allows monitoring of data flow on MS
instrument and autoprocessing of that data
Mascot Databases &
General Disk Needs
Example #1 Peptide Mass
Fingerprinting (PMF)
2D-GE + MALDI (PMF)
Trypsin
+ Gel punch
p53
Trx
G6PDH
Peptide Mass Fingerprinting
• Used to identify protein spots on gels or
protein peaks from an HPLC run
• Depends of the fact that if a peptide is cut up
or fragmented in a known way, the resulting
fragments (and resulting masses) are unique
enough to identify the protein
• Requires a database of known sequences
• Uses software to compare observed masses
with masses calculated from database
Principles of Fingerprinting
Sequence Mass (M+H) Tryptic Fragments
>Protein 1 acedfhsak
acedfhsakdfqea dfgeasdfpk
4842.05
sdfpkivtmeeewe ivtmeeewendadnfek
ndadnfekqwfe gwfe
>Protein 2 acek
acekdfhsadfqea dfhsadfgeasdfpk
4842.05
sdfpkivtmeeewe ivtmeeewenk
nkdadnfeqwfe dadnfeqwfe
>Protein 3 acedfhsadfgek
acedfhsadfqeka asdfpk
4842.05
sdfpkivtmeeewe ivtmeeewendak
ndakdnfeqwfe dnfegwfe
Principles of Fingerprinting
Sequence Mass (M+H) Mass Spectrum
>Protein 1
acedfhsakdfqea
4842.05
sdfpkivtmeeewe
ndadnfekqwfe
>Protein 2
acekdfhsadfqea
4842.05
sdfpkivtmeeewe
nkdadnfeqwfe
>Protein 3
acedfhsadfqeka
4842.05
sdfpkivtmeeewe
ndakdnfeqwfe
Protease Cleavage Rules
Trypsin XXX[KR]--[!P]XXX
Chymotrypsin XX[FYW]--[!P]XXX
Lys C XXXXXK-- XXXXX
Asp N endo XXXXXD-- XXXXX
CNBr XXXXXM--XXXXX
Why Trypsin?
• Robust, stable enzyme
• Works over a range of pH values & Temp.
• Quite specific and consistent in cleavage
• Cuts frequently to produce “ideal” MW peptides
• Inexpensive, easily available/purified
• Does produce “autolysis” peaks (which can be
used in MS calibrations)
– 1045.56, 1106.03, 1126.03, 1940.94, 2211.10, 2225.12,
2283.18, 2299.18
Calculating Peptide Masses
• Sum the monoisotopic residue masses
• Add mass of H2O (18.01056)
• Add mass of H+ (1.00785 to get M+H)
• If Met is oxidized add 15.99491
• If Cys has acrylamide adduct add 71.0371
• If Cys is iodoacetylated add 58.0071
• Other modifications are listed at
– [Link]
• Only consider peptides with masses > 400
Post-Translational
Modifications (PTM)
Masses in MS
• Monoisotopic
mass is the mass
determined using
the masses of the
most abundant
isotopes
• Average mass is
the abundance
weighted mass of
all isotopic
components
Amino Acid Residue Masses
Monoisotopic Mass
Glycine 57.02147 Aspartic acid 115.02695
Alanine 71.03712 Glutamine 128.05858
Serine 87.03203 Lysine 128.09497
Proline 97.05277 Glutamic acid 129.04264
Valine 99.06842 Methionine 131.04049
Threonine 101.04768 Histidine 137.05891
Cysteine 103.00919 Phenylalanine 147.06842
Isoleucine 113.08407 Arginine 156.10112
Leucine 113.08407 Tyrosine 163.06333
Asparagine 114.04293 Tryptophan 186.07932
Amino Acid Residue Masses
Average Mass
Glycine 57.0520 Aspartic acid 115.0886
Alanine 71.0788 Glutamine 128.1308
Serine 87.0782 Lysine 128.1742
Proline 97.1167 Glutamic acid 129.1155
Valine 99.1326 Methionine 131.1986
Threonine 101.1051 Histidine 137.1412
Cysteine 103.1448 Phenylalanine 147.1766
Isoleucine 113.1595 Arginine 156.1876
Leucine 113.1595 Tyrosine 163.1760
Asparagine 114.1039 Tryptophan 186.2133
Preparing a Peptide Mass
Fingerprint Database
• Take a protein sequence database (Swiss-
Prot or nr-GenBank)
• Determine cleavage sites and identify
resulting peptides for each protein entry
• Calculate the mass (M+H) for each peptide
• Sort the masses from lowest to highest
• Have a pointer for each calculated mass to
each protein accession number in databank
Building A PMF Database
Sequence DB Calc. Tryptic Frags Mass List
>P12345 acedfhsak 450.2017 (P21234)
acedfhsakdfqea dfgeasdfpk 609.2667 (P12345)
sdfpkivtmeeewe ivtmeeewendadnfek 664.3300 (P89212)
ndadnfekqwfe gwfe 1007.4251 (P12345)
1114.4416 (P89212)
>P21234 acek 1183.5266 (P12345)
acekdfhsadfqea dfhsadfgeasdfpk 1300.5116 (P21234)
sdfpkivtmeeewe ivtmeeewenk 1407.6462 (P21234)
nkdadnfeqwfe dadnfeqwfe 1526.6211 (P89212)
1593.7101 (P89212)
>P89212 acedfhsadfgek 1740.7501 (P21234)
acedfhsadfqeka asdfpk 2098.8909 (P12345)
sdfpkivtmeeewe ivtmeeewendak
ndakdnfeqwfe dnfegwfe
The Fingerprint (PMF) Algorithm
• Take a mass spectrum of a trypsin-
cleaved protein (from gel or HPLC peak)
• Identify as many masses as possible in
spectrum (avoid autolysis peaks)
• Compare query masses with database
masses and calculate # of matches or
matching score (based on length and
mass difference)
• Rank hits and return top scoring entry –
this is the protein of interest
Query (MALDI) Spectrum
1007
1199
2211 (trp)
609
2098
450
1940 (trp)
698
500 1000 1500 2000 2500
Query vs. Database
Query Masses Database Mass List Results
450.2201 450.2017 (P21234) 2 Unknown masses
609.3667 609.2667 (P12345) 1 hit on P21234
698.3100 664.3300 (P89212) 3 hits on P12345
1007.5391 1007.4251 (P12345)
1199.4916 1114.4416 (P89212) Conclude the query
2098.9909 1183.5266 (P12345) protein is P12345
1300.5116 (P21234)
1407.6462 (P21234)
1526.6211 (P89212)
1593.7101 (P89212)
1740.7501 (P21234)
2098.8909 (P12345)
What You Need To Do PMF
• A list of query masses (as many as possible)
• Protease(s) used or cleavage reagents
• Databases to search (SWProt, NR, Organism)
• Estimated mass and pI of protein spot (opt)
• Cysteine (or other) modifications
• Minimum number of hits for significance
• Mass tolerance (100 ppm = 1000.0 ± 0.1 Da)
• A PMF website (Prowl, ProFound, Mascot, etc.)
PMF on the Web
• Mascot
• [Link]
• ProFound
– [Link]
• MOWSE
• [Link]
• PeptideSearch
• [Link]
[Link]/GroupPages/[Link]
• PeptIdent
• [Link]
Mascot – PMF Query
click
[Link]
Exercise #1
• Analysis of a yeast protein (75 KDa)
treated with iodoacetamide,
trypsinized and subject to MALDI-TOF
• Go to “Worked Example 1” in your
notes to follow instructions
• Access your PMF data at:
[Link]
listed as [Link]
What Are Missed Cleavages?
Sequence Tryptic Fragments (no missed cleavage)
>Protein 1 acedfhsak (1007.4251)
acedfhsakdfqea
dfgeasdfpk (1183.5266)
sdfpkivtmeeewe
ndadnfekqwfe ivtmeeewendadnfek (2098.8909)
gwfe (609.2667)
Tryptic Fragments (1 missed cleavage)
acedfhsak (1007.4251)
dfgeasdfpk (1183.5266)
ivtmeeewendadnfek 2098.8909)
gwfe (609.2667)
acedfhsakdfgeasdfpk (2171.9338)
ivtmeeewendadnfekgwfe (2689.1398)
dfgeasdfpkivtmeeewendadnfek (3263.2997)
Mascot Databases
MASCOT Scoring
Why Probability-Based
Scoring?
• Will explain PBS later…
• Offers a simple numerical (and graphical)
assessment of whether a result is significant
• More reliable/accurate than simple mass or
peptide cut-off techniques
• Allows both MS and PMF data to be scored the
same way
• Scores from different searches or different
databases can be easily & directly compared
Mascot Scoring
• The statistics of peptide fragment
matching in MS (or PMF) is very similar to
the statistics used in BLAST
• The scoring probability appears to follow
an extreme value distribution
• High scoring segment pairs (in BLAST)
are analogous to high scoring mass
matches in Mascot
• Mascot scoring system is based on the
MOWSE scoring system
MOWSE
• MOlecular Weight SEarch
• Scoring system based on peptide
frequency distribution from the OWL
non redundant protein Database
Pappin DJC, Hojrup P, and Bleasby AJ (1993) Rapid
identification of proteins by peptide-mass
fingerprinting. Curr. Biol. 3:327-332
Bleasby
MOWSE
Sequence Mass (M+H) Tryptic Fragments
>Protein 1 acedfhsak
acedfhsakdfqea dfgeasdfpk
4842.05
sdfpkivtmeeewe ivtmeeewendadnfek
ndadnfekqwfe gwfe
>Protein 2 acek
acekdfhsadfqea dfhsadfgeasdfpk
4842.05
sdfpkivtmeeewe ivtmeeewenk
nkdadnfeqwfe dadnfeqwfe
>Protein 3 SQDDEIGDGTTGVVVLAGALLEEAEQLLDR2
DGDVTVTNDGATILSMMDVD HQIAK
MASMGTLAFD EYGRPFLIIK
MASMGTLAFDEYGRPFLIIK2
DQDRKSRLMG LEALKSHIM
TSLGPNGLDK
A AKAVANTMRT SLGPNGLD 14563.36 LMGLEALK
KMMVDKDGDVTV TNDGAT
LMVELSK
ILSM MDVDHQIAKL MVELS
AVANTMR
KSQDD EIGDGTTGVV VLAG
SHIMAAK
ALLEEAEQLLDRGIHP IRIAD
GIHPIR
MMVDK
DQDR
MOWSE
1. Group Proteins into 10 kDa ‘bins’.
>Protein 1
acedfhsakdfqea
4954.13
sdfpkivtmeeewe
ndadnfekqwfel
0-10 kDa >Protein 2
acekdfhsadfqea
5672.48
sdfpkivtmeeewe
nkdadnfeqwfekq
wfei
>Protein 3
MASMGTLAFD EYGRPFLIIK 14563.36
DQDRKSRLMG LEALKSHIM
10-20 kDa A AKAVANTMRT SLGPNGLD
KMMVDKDGDVTV TNDGAT
ILSM MDVDHQIAKL MVELS
KSQDD EIGDGTTGVV VLAG
ALLEEAEQLLDRGIHP IRIAD
MOWSE
2. For each protein, place fragments into 100 Da bins.
>Protein 1 Mol. Wt. Fragment Bin Fragment
acedfhsakdfqea 2098.8909 IVTMEEEWENDADNFEK 2000-2100 IVTMEEEWENDADNFEK
1183.5266 DFQEASDFPK 1900-2000
sdfpkivtmeeewe 1007.4251 ACEDFHSAK 1800-1900
722.3508 QWFEL DFHSADFQEASDFPK
ndadnfekqwfel 1700-1800
1600-1700
1500-1600
1400-1500 IVTMEEEWENK, DADNFEQWFE
>Protein 2 1300-1400
1200-1300
acekdfhsadfqea 1740.7500 DFHSADFQEASDFPK 1100-1200 DFQEASDFPK
sdfpkivtmeeewe 1407.6460 IVTMEEEWENK 1000-1100 ACEDFHSAK
1456.6127 DADNFEQWFEK 900-1000
nkdadnfeqwfekq 722.3508 QWFEI 800-900
700-800
wfei 600-700 QWFEL, QWFEI
500-600
400-500
MOWSE
The MOWSE frequency distribution plot looks like this:
MOWSE
3. Divide the number of fragments for each bin by the total
number of fragments for each 10 kDa protein interval
Bin Fragment Tot al Frequency
2000-2100 IVTMEEEWENDADNFEK 1 0.125
1900-2000 0 0.000
1800-1900 0 0.000
1700-1800 DFHSADFQEASDFPK 1 0.125
1600-1700 0 0.000
1500-1600 0 0.000
1400-1500 IVTMEEEWENK, DADNFEQWFE 2 0.250
1300-1400 0 0.000
1200-1300 0 0.000
1100-1200 DFQEASDFPK 1 0.125
1000-1100 ACEDFHSAK 1 0.125
900-1000 0 0.000
800-900 0 0.000
700-800 0 0.000
600-700 QWFEL, QWFEI 2 0.250
500-600 0 0.000
400-500 0 0.000
MOWSE
4. For each 10 kD interval, normalize to the largest
bin value
Bin Fragment Tot al Frequency Normalized
2000-2100 IVTMEEEWENDADNFEK 1 0.125 0.5
1900-2000 0 0.000 0
1800-1900 0 0.000 0
1700-1800 DFHSADFQEASDFPK 1 0.125 0.5
1600-1700 0 0.000 0
1500-1600 0 0.000 0
1400-1500 IVTMEEEWENK, DADNFEQWFE 2 0.250 1
1300-1400 0 0.000 0
1200-1300 0 0.000 0
1100-1200 DFQEASDFPK 1 0.125 0.5
1000-1100 ACEDFHSAK 1 0.125 0.5
900-1000 0 0.000 0
800-900 0 0.000 0
700-800 0 0.000 0
600-700 QWFEL, QWFEI 2 0.250 1
500-600 0 0.000 0
400-500 0 0.000 0
MOWSE
5. Compare spectrum masses against fragment mass
list for each protein in the database. Retrieve the
frequency score for each match and multiply.
Bin Fragment Tot al Frequency Normalized
2000-2100 IVTMEEEWENDADNFEK 1 0.125 0.5
1900-2000 0 0.000 0
1800-1900 0 0.000 0
1700-1800 DFHSADFQEASDFPK 1 0.125 0.5
1600-1700 0 0.000 0
1500-1600 0 0.000 0
1740.7500
1400-1500 IVTMEEEWENK, DADNFEQWFE 2 0.250 1
1456.6127 1300-1400 0 0.000 0
722.3508 1200-1300 0 0.000 0
1100-1200 DFQEASDFPK 1 0.125 0.5
1000-1100 ACEDFHSAK 1 0.125 0.5
900-1000 0 0.000 0
800-900 0 0.000 0
700-800 0 0.000 0
0.5 x 1 x 1 = 0.5 600-700
500-600
QWFEL, QWFEI 2
0
0.250
0.000
1
0
400-500 0 0.000 0
MOWSE
6. Invert and multiply, and normalize to an 'average'
protein of 50 000 k Da:
PN = product of distribution frequency scores
= 0.5 x 1 x 1 = 0.5
Score = 50 000 H = 'Hit' Protein MW
PN x H = 5672.48
= 50 000 = 17.62
0.5 x 5672.48
MOWSE
Takes into account relative abundance
of peptides in the database when
calculating scores
Protein size is compensated for
The model consists of numerous
spaces separated by 100 Da (the average
aa mass)
Does not provide a measure of
confidence for the prediction
MASCOT
• Probability-based MOWSE scoring
• The probability that the observed
match between experimental data and a
protein sequence is a random event is
approximately calculated for each
protein in the sequence database.
Probability model details not published
Perkins DN, Pappin DJC, Creasy DM, and Cottrell JS (1999) Probability-based
protein identification by searching sequence databases using mass spectrometry
data. Electrophoresis 20:3551-3567.
Mascot/Mowse Scoring
• The Mascot Score is given as S = -10*Log(P),
where P is the probability that the observed
match is a random event
• Try to aim for probabilities where P<0.05 (less
than a 5% chance the peptide mass match is
random)
• With today’s databases, Mascot scores
greater than 76 are significant (p<0.05)
• We show in the Mascot Lab that a score's
statistical significance is a complex function
of database size, mass window tolerance, etc.
Mascot Scoring
– The Mascot Score is given as S = -10*Log(P), where P is
the probability that observed match is a random event
– The significance of that result depends on the size of the
database being searched. Mascot shades in green the
insignificant hits using a P=0.05 cutoff
In this example,
scores less than 74 are
insignificant
Mascot Score:
120 = 1x10-12
Advantages of PMF
• Uses a “robust” & inexpensive form of MS
(MALDI)
• Doesn’t require too much sample optimization
• Can be done by a moderately skilled operator
(don’t need to be an MS expert)
• Widely supported by web servers
• Improves as DB’s get larger & instrumentation
gets better
• Very amenable to high throughput robotics
(up to 500 samples a day)
Limitations With PMF
• Requires that the protein of interest
already be in a sequence database
• Spurious or missing critical mass peaks
always lead to problems
• Mass resolution/accuracy is critical, best
to have <20 ppm mass resolution
• Generally found to only be about 40%
effective in positively identifying gel spots
Example #2 MS/MS
Identification of a Protein
from a Peptide Mixture
MS-MS for Protein ID
• Proteins are isolated (from gel or HPLC)
and subjected to tryptic digestion
• Peptides are sent through ionizer and into
a collision cell where the doubly charged
ions are selected and fragmented through
collision induced decay (CID)
• The resulting singly charged ions
(daughter ions) are analyzed to determine
the sequence or to ID the parent peptide
Why Trypsin for MS-MS?
• CID of peptides less than 2-3 kD is most
reliable for MS-MS studies – The
frequency of tryptic cleavage guarantees
that most peptides will be of this size
• Trypsin cleaves on the C-terminal side of
arginine and lysine. By putting the basic
residues at the C-terminus, peptides
fragment in a more predictable manner
throughout the length of the peptide
Why Double Charges?
• Easiest spectra to interpret are those
obtained from doubly-charged peptide
precursors, where the resulting fragment
ions are mostly singly-charged
• Doubly-charged precursors also fragment
such that most of the peptide bonds break
with comparable frequency, such that one
is more likely to derive a complete
sequence
MS-MS & Peptide Fragments
• When peptides are proteins are admitted
to a collision cell the peptide usually
fragments at the weakest bond (the
peptide bond, but some CH-NH and CH-
CO breakage also occurs)
• Collision conditions have to be optimized
for each peptide
• Two main types of daughter ions are
produced -- “b” ions and “y” ions
MS-MS Peptide Fragmentation
yn-1 yn-2 y1
R1 R2 R3 Rn
H2N-CH-CO-NH-CH-CO-NH-CH-CO…CO-NH-CH-CO2H
b1 b2 bn-1
b1 y1 b2 y2 b3 y3 b4 y4 b5 y5
signal
MS-MS Peptide Fragmentation
Ala-Gly-His-Leu-….Phe-Glu-Cys-Tyr
b1 y1 b2 y2 b3 y3 b4 y4 b5 y5
signal
Different MS-MS Instruments
Yield Different Spectra
• A typical QTOF or triple quad MS-MS
spectrum of a tryptic peptide contains a
continuous series of y-type ions. The b-type
ions are usually seen only at lower masses
below the precursor m/z value
• Ion trap CID data of tryptic peptides is
different in that one often finds a
continuous series of both b-type and y-type
ions throughout the spectrum
MS/MS – The Movie
(Kathleen Binns)
• [Link]
[Link]
Protein ID by MS-MS
• Peptide fragments from target protein are
sequenced by MS-MS using a variety of
algorithms (SEQUEST, Mascot) or via
manual methods
• The peptide fragment sequences are sent
to BLAST to be queried against a protein
sequence database
• The protein having the highest number of
sequence matches is ID’d as the target
MS-MS & Proteomics
Advantages Disadvantages
• Provides precise • Requires more handling,
sequence-specific data refinement and sample
• More informative than manipulation
PMF methods (>90%) • Requires more expensive
• Can be used for de- and complicated
novo sequencing (not equipment
entirely dependent on • Requires high level
databases) expertise
• Can be used to ID post- • Slower, not generally
trans. modifications high throughput
Mascot – MS/MS Query
click
[Link]
Exercise #2
• Analysis of a human nuclear protein
(65 KDa) treated with iodoacetamide
and trypsinized followed by MS/MS
• Go to “Worked Example 2” in your
notes to follow instructions
• Access your MS/MS data at:
[Link]
listed as [Link]
Mascot and MS/MS formats
• For MS/MS work, the data file must
contain 1 or more sets of MS/MS data
• Supported sets include:
• * Finnigan (.ASC)
• * Micromass (.PKL)
• * Sequest (.DTA)
• * PerSeptive (.PKS)
• * Sciex API III
• * Mascot Generic Format (.MGF)
Mascot Generic Format (MGF)
COM=10 pmol digest of Sample X15
ITOL=1
ITOLU=Da
MODS=Met Ox,Cys B propionamide
MASS=Monoisotopic
USERNAME=Lou Scene
USEREMAIL=leu@[Link]
CHARGE=2+ and 3+
BEGIN IONS Parent ion
Mass (2+)
Daughter ion TITLE=Peak 1
mass PEPMASS=983.6
846.60 73 intensity
846.80 44
847.60 67
Example #3 A “Hard”
MS/MS Problem
Exercise #3
• Analysis of a novel neuropeptide
hormone induced by music/sound
• No known or suspected PTMs
• Ion trap MS-MS spectrum – What is
it? What’s the sequence?
• Access your MS/MS data at:
[Link]
listed as [Link]
MS/MS Spectrum of
Neurosensin
What Do You Find?
Protocols for MS-MS
Sequencing
• Usually can’t tell a “b” ion from a “y” ion
• Assume the lowest mass visible in the
spectrum is a lysine or arginine (this is the
y1 ion) this is because trypsin cuts after a
lysine or arginine
• This y1 mass should be 147.113 for lysine
or 175.119 for arginine {The y1 ion is
calculated by adding 19.018 u (three
hydrogens and one oxygen) to the residue
masses of lysine and arginine}
MS-MS Sequencing
• Using the mass tables, look to the right of y1
and see if you can find another prominent
peak that is equal to y1 + AA where AA is the
residue mass for any of the 20 amino acids.
This is the y2 ion
• Proceed in a rightward direction, identifying
other yn ions that differ by an AA residue
mass (don’t expect to find all)
• The yn series produces a “reverse” sequence
• Watch for possible dipeptide peaks that may
fool you
Things To Remember
• Gly + Gly = 114.043 u and Asn = 114.043 u
• Ala + Gly = 128.059 u and Gln = 128.059 u
and Lys = 128.095 u
• Gly + Val = 156.090 u and Arg = 156.101 u
• Ala + Asp = Glu + Gly = 186.064 and Trp =
186.079 u
• Ser + Val = 186.100 u and Trp = 186.079 u
• Leu = Ile = 113.084u
MS-MS Sequencing
• Use the remaining “unassigned” peaks to
see if you can construct a “b” ion series
• The highest mass peak corresponds to the
parent ion or parent minus 147 (K) or 175 (R)
• The “b” ions give the “normal” sequence
• Both forward (b ion) and backward (y ion)
sequences should be consistent
• Use the resulting sequence tag to search the
databases using BLAST (remember to use a
high Expect value ~ 100) to see if the
sequence matches something
Conclusions
• Mascot is an excellent FREE resource
for doing PMF and MS/MS searches of
proteins
• Understanding the scoring scheme
and importance of database size (and
mass tolerance) is critical to using
Mascot optimally
• Not everything can be done on Mascot