Introduction to Bioinformatics
Online Course: IBT
Multiple Sequence Alignment
Building Multiple Sequence Alignment
Lec1 Building a Multiple Sequence Alignment
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Learning Outcomes
1- Understanding Why multiple sequence alignment is useful for scientists
2-Identifying situations where multiple alignments do not help
3-Main Criteria for Building a Multiple Sequence Alignment
4- Main Applications of Multiple Sequence Alignments
5-What are the kinds of sequences you’re looking for?
6- Tips for Naming sequences
7- Tips for difficult MSA to interpret
8- Comparing sequences you cannot align
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
In the coming lectures we will learn
1- Gathering the sequences you need to make a multiple sequence
alignment
2- Differences between some famous multiple sequence alignment
programs
COBALT (Constraint-based Multiple Alignment Tool) New
ClustalW (everybody uses it),
MUSCLE (very fast)
TCOFFEE (accurate and combine sequences and structures)
3- Creating and comparing multiple sequence alignments with -
Comparing sequences you cannot align
Claverie J, Notredame C (2007). Bioinformatics for Dummies (2nd Edn). Wiley publishing, Inc. 436 pp.
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Building a Multiple Sequence Alignment (1)
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
- “In many ways, multiple sequence alignments are to
bioinformatics what Swiss knives are to MacGyver”
- Building multiple sequence alignments is far from
an exact science
- In fact, it’s more art than science, requiring that
you use everything you know in bioinformatics
and in biology.”
Claverie J, Notredame C (2007). Bioinformatics for Dummies (2nd Edn). Wiley publishing, Inc. 436 pp.
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Identifying situations where multiple
alignments do not help
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
• Don’t work well for assembling the sequence
pieces in a sequencing project.
• if you want to turn an EST cluster into a gene
sequence
• When the sequence you’re interested in has
no homologue in any of the sequence
databases (in this case you can use functional
criteria and conducting a pattern search).
Claverie J, Notredame C (2007). Bioinformatics for Dummies (2nd Edn). Wiley publishing, Inc. 436 pp.
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Building informative alignments
Gathering your sequences
Compute a multiple sequence alignment
Evaluate the quality of your alignment
Interpreting Your MSA
keep the sequences for further analysis
Mansour A, Jaime A. Teixeira da Silva, Gábor Gyulai )2009( Assessment of molecular (dis)similarity:
The role of multiple sequence alignments (MSA) programs in biological research. Genes, genomes and
genomics( 30-23 :)1 eussI laicepS(3 .Print ISSN ( )0383-1749Bioinformatics SI.)
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
What we are looking for with MSA?
“The idea behind a multiple alignment is to put
amino acids or nucleotides in the same column
because they’re similar according to some criterion.
You can use four major criteria to build a multiple
alignment of sequences that all have different
properties.”
Claverie J, Notredame C (2007). Bioinformatics for Dummies (2nd Edn). Wiley publishing, Inc. 436 pp.
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Main Criteria for Building a Multiple Sequence Alignment
1- Structural similarity
Amino acids that play the same role in each structure are
in the same column. Structure-superposition programs
are the only ones that use this criterion.
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Main Criteria for Building a Multiple Sequence Alignment
2- Evolutionary similarity
Amino acids or nucleotides related to the same amino acid
(or nucleotide) in the common ancestor of all the
sequences are put in the same column. No automatic
program explicitly uses this criterion, but they all try to
deliver an alignment that respects it.
Claverie J, Notredame C (2007). Bioinformatics for Dummies (2nd Edn). Wiley publishing, Inc. 436 pp.
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Main Criteria for Building a Multiple Sequence Alignment
3- Functional similarity
Amino acids or nucleotides with the same function are in
the same column. No automatic program explicitly uses
this criterion, but if the information is available, you can
force some programs to respect it — or you can edit your
alignment manually.
Claverie J, Notredame C (2007). Bioinformatics for Dummies (2nd Edn). Wiley publishing, Inc. 436 pp.
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Main Criteria for Building a Multiple Sequence Alignment
4- Sequence similarity
“Amino acids in the same column are those that yield an alignment with
maximum similarity. Most programs use sequence similarity because it is
the easiest criterion. When the sequences are closely related, their
structural, evolutionary, and functional similarities are equivalent to
sequence similarity”.
Claverie J, Notredame C (2007). Bioinformatics for Dummies (2nd Edn). Wiley publishing, Inc. 436 pp.
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Main Applications of Multiple Sequence Alignments•
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Main Applications of Multiple Sequence Alignments•
Extrapolation
“A good multiple alignment can help convince you that an uncharacterized
sequence is really a member of a protein family. Alignments that include
Swiss-Prot sequences are the most informative. Use the ExPASyBLAST
server (at www. expasy.ch/tools/blast/) to gather and align them”.
Claverie J, Notredame C (2007). Bioinformatics for Dummies (2nd Edn). Wiley publishing, Inc. 436 pp.
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Main Applications of Multiple Sequence Alignments•
Phylogenetic Analysis
“If you carefully choose the sequences you include in your analysis multiple
alignment, you can reconstruct the history of these proteins. Use the Pasteur Phylip
server at bioweb.pasteur.fr/seqanal/phylogeny/phylip-uk.html.”
Claverie J, Notredame C (2007). Bioinformatics for Dummies (2nd Edn). Wiley publishing, Inc. 436 pp.
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Main Applications of Multiple Sequence Alignments•
Pattern identification
“By discovering very conserved positions, you can identify a
identification region that is characteristic of a function (in proteins or in
nucleic-acid sequences). Use the Weblogo server
http://weblogo.berkeley.edu/logo.cgi”
Paste your sequences here
Claverie J, Notredame C (2007). Bioinformatics for Dummies (2nd Edn). Wiley publishing, Inc. 436 pp.
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Main Applications of Multiple Sequence Alignments•
Domain identification
“It is possible to turn a multiple sequence alignment into a profile that describes a
protein family or a protein domain (PSSM). You can use this profile to scan
databases for new members of the family. Use PROSITE (http://prosite.expasy.org/)”
Claverie J, Notredame C (2007). Bioinformatics for Dummies (2nd Edn). Wiley publishing, Inc. 436 pp.
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Main Applications of Multiple Sequence Alignments•
DNA regulatory elements
“You can turn a DNA multiple alignment of a binding site into a weight matrix
and scan other DNA sequences for potentially elements similar binding
sites. Use the Gibbs sampler to identify these sites:
http://ccmbweb.ccv.brown.edu/gibbs/gibbs.html”
Claverie J, Notredame C (2007). Bioinformatics for Dummies (2nd Edn). Wiley publishing, Inc. 436 pp.
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Main Applications of Multiple Sequence Alignments•
Structure prediction
“A good multiple alignment can give you an almost perfect prediction
of your protein secondary structure for both proteins and RNA.
Sometimes it can also help in the building of a 3-D model”.
Claverie J, Notredame C (2007). Bioinformatics for Dummies (2nd Edn). Wiley publishing, Inc. 436 pp.
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Main Applications of Multiple Sequence Alignments•
nsSNP analysis
“Various gene alleles often have different amino-acid sequences.
Multiple alignments can help you predict whether a Non-
Synonymous Single-Nucleotide Polymorphism is likely to be
harmful. See the SIFT site for more details: http://sift.jcvi.org/”
Claverie J, Notredame C (2007). Bioinformatics for Dummies (2nd Edn). Wiley publishing, Inc. 436 pp.
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Main Applications of Multiple Sequence Alignments•
PCR analysis
“A good multiple alignment can help you identify the less
degenerated portions of a protein family, in order to fish out new
members by PCR (polymerase chain reaction). If this is what you
want to do, you can use the following site:
blocks.fhcrc.org/codehop.html”
Claverie J, Notredame C (2007). Bioinformatics for Dummies (2nd Edn). Wiley publishing, Inc. 436 pp.
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Paste your sequences here
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
What are the kinds of sequences you’re looking for?
Always bear in mind that in evolution:
1- Important amino acids (or nucleotides) are NOT allowed
to mutate. For instance, active sites of enzymes are much
conserved.
2- Less-important residues change more easily —
sometimes randomly —and sometimes in order to adapt a
function.
Claverie J, Notredame C (2007). Bioinformatics for Dummies (2nd Edn). Wiley publishing, Inc. 436 pp.
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Tips for Naming sequences
Never use white spaces
Do not use special symbols
Never use names longer than 15 characters
Never give the same name to two different sequences
Mansour A, Jaime A. Teixeira da Silva, Gábor Gyulai )2009( Assessment of molecular (dis)similarity:
The role of multiple sequence alignments (MSA) programs in biological research. Genes, genomes and
genomics( 30-23 :)1 eussI laicepS(3 .Print ISSN ( )0383-1749Bioinformatics SI.)
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Tips for difficult MSA to interpret
1 Clickinsertions/deletions
Remove to add Title
2 Redo MSAto
Click with
addtheTitle
smaller set
13 Keep trimming
Click to interpret
to add Title
Mansour A, Jaime A. Teixeira da Silva, Gábor Gyulai )2009( Assessment of molecular (dis)similarity:
The role of multiple sequence alignments (MSA) programs in biological research. Genes, genomes and
genomics( 30-23 :)1 eussI laicepS(3 .Print ISSN ( )0383-1749Bioinformatics SI.)
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Enhancing Alignments
Remove gaps
Enhancing
Remove extremities your
Alignment
Keep informative blocks
Mansour A, Jaime A. Teixeira da Silva, Gábor Gyulai )2009( Assessment of molecular (dis)similarity:
The role of multiple sequence alignments (MSA) programs in biological research. Genes, genomes and
genomics( 30-23 :)1 eussI laicepS(3 .Print ISSN ( )0383-1749Bioinformatics SI.)
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
(to try in your own time)
Searching sequences on the ExPASy server
Only to retrieve protein sequences in FASTA format
Example: Heat shock factor 1 (HSF1)
Choose http://www.uniprot.org/
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
(to try in your own time)
Select the sequences you want
This is the most delicate part of the process
you can use the following guidelines
- Select the top sequence.
- For a first analysis, you want to select ten
sequences or fewer.
• check it’s similar to the query sequence - along
its entire length.
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
(to try in your own time)
Methods to export your sequences
• FASTA: Generates a file that contains your
sequences in FASTA format.
• ClustalW, Tcoffee, and MAFFT: These are MSA
packages running on the EMBnet server.
• Reduce Redundancy: This option will extract the
most meaningful sequences from your dataset.
• Pratt: Will search for conserved motifs in your
sequences without aligning them
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Practical
(to try in your own time)
- Go to https://www.expasy.org/proteomics
- Search for HSF1
- Click on (UniProtKB)
- Retrieve your protein sequences (eg. Heat shock Factor1 “HSF1”) from different
organisms
- This will take you to http://www.uniprot.org/uniprot/?query=HSF1&sort=score
- Select your organism (Human, Rat, Mouse, Arabidopsis, Chicken, Pig)
- Click Download (Download Selected) then (Go)
- Save it in FASTA format in one text file.
- Align the sequences using Clustal Omega
- Checking the gene-based phylogentics tree
- Add one more sequence NOT related sequence (Out Group)
- Checking the change on the gene-based phylogentics tree
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy
Introduction to Bioinformatics Online Course:IBT
Multiple Sequence Alignment| Prof. Ahmed M. Alzohairy