Experiment 2
PAIR WISE AND MULTIPLE SEQUENCE ALIGNMENTS OF SEQUENCES AND
DETERMINATION OF SIGNATURE SEQUENCES
1) Pair Wise Alignments
Introduction
Pair wise alignments is a similarity searching method in which two sequences
(protein/nucleotide) are aligned one below another and the percentage of similarity and
identity between these two sequences are analyzed. Identity is a quantitative measure of how
low two sequences are analyzed and related to one another based on total number of exact
matches in an alignment. Similarity is a measure of how two sequences are related to each
other based on total number of identities and conserved substitution in an alignment. To find
out the similarity across the full extent of sequences are called global alignment and aligning
the highest density of matches in two sequences are called local alignment. Introduction of
gap (gap opening and gap extension) helps to improve the alignment quality. We are
generally using certain algorithm and scoring matrices for aligning two sequences one below
another in pair wise alignment. Needlemen Wunsch algorithm is used for global alignment
and Smith Waterman algorithm is used to find out local alignment. BLOSUM are the popular
matrices used in the process.
ALIGN tool EMBOSS hosted in EMBL-EBI home page is a suitable platform for
performing pair wise alignment between the sequences.
Objectives
1. To perform pair wise alignment by comparing two molecular sequences
(protein/nucleotide) using EMBOSS ALIGN tool.
2. To analyzed the important parameter affecting the alignment.
Components required
Bioinformatics Tools and Database : Genbank, EMBOSS Align
Procedures
1. The query sequences of Geobacillus thuringiensis and Bacillus icigianus were retrieved
from NCBI. The sequences was pasted on a notepad in a FASTA format.
[Link]
2. EBI home page [Link] was opened.
3. The EMBOSS Align tool was opened by using: Tools > Sequences alignment > Align
4. The two query sequence was pasted in the text area provided in the web page. The
options such as matrix, method, gap penalty, Type of molecule (DNA/Protein) etc was
checked.
5. The option global/local alignment was selected in the method box and the page was
saved. The run button was heck.
6. The result was analyze by checking density of exact matches (vertical line) and similarity
(colon and single dot). The percentage of identity, similarity and gaps was checked.
7. The webpage was saved and all the result were displayed.
Results
The pair wise alignment the sequences Geobacillus thuringiensis and Bacillus icigianus were
successfully performed using EMBOSS Align tools and the important parameter affecting the
alignment was analyzed. All the results were displayed.
>EEM36206.1 Pseudouridine synthase [Bacillus thuringiensis serovar
thuringiensis str. T01001]
METKKKGEWCEITVPAKWNGISIESVLKVEWEIPKKLLHQLRMEKGVTVNGEQRRWNELLKENDKLQV
HM
FAEEEYGVDPEYGELHVVYEDDHVLIVNKPEKMDTHPAEKGGTGTLANLVAFHFQMQGLEAKVRHIHR
LD
KDTTGGVVFAKHRIAGAIMDRLLMERKIKRTYAALVEGKVKGKQGTIDAAIGRDRHHATRRRISPKGN
QA
ITYYKVEKYFKKQNTTFVTLQLETGRTHQIRVHMSHNGNPLVGDVLYGGQTKYMSSQALHAMKINFLH
PI
TKEAIEVDVPFPTKLDNKIREFQKENA
>KFX34290.1 lipase [Geobacillus icigianus]
MKCCRVLFLLLGLWFVFGLSVAGGRAEAAASRANDAPIVLLHGFTGWGRDEMVGFKYWGGVRGDIEQW
LN
DNGYRTYTLAVGPLSSNWDRACEAYAQLVGGTVDYGAAHAAKHGHARFGRTYPGLLPELKQGGRIHII
AH
SQGGQTARLLVSLLENGSQEEREYAKAHNVSLSPLFEGGHRFVLSVTTIATPHDGTTLVNMVDFTDRF
FD
LQKAVLKTAAVASNVPYTDSVYDFKLDQWGLRRQPGESFDHYVERLKRSPVWTSTDTARYDLSIPGAE
AL
NRWVQASPHTYYLSFSTERTEQGAWTGNHYPELGMTAFSAVVCAPFLGSYRNPALGVDDRWLENDGIV
NT
VSMNGPKRGSSDRIVPYDGTIRKGVWNDMGTYNVDHLEVIGVDPNPLFPIRSFYLRLAEQLAGLRP
Discussion
The result of the two sequences were analyzed by checking density of exact matches
(vertical line) and similarity (colon and single dot). The percentage of identity, similarity and
gaps was checked. The identity is the identical amino acid or nucleotide. Gaps is the point of
mutation while similarity is the share of similar biochemical properties. Pairwise alignment
methods are concerned with finding the best local or global alignments of protein or DNA
sequences. The most important application of pairwise alignment is identification of
sequences of unknown structure of function. Another important use is the study of molecular
evolution. Pairwise alignments can be used between two sequences at a time, but they are
efficient to calculate. The three primary methods of producing pairwise alignments are dot-
matrix methods, dynamic programming and word methods.
In this experiment, EMBOSS Align tool was used to perform sequence alignment.
EMBOSS is a new, free Open Source software analysis package specially developed for the
needs of the molecular biology (e.g. EMBnet) user community. The software automatically
copes with data in a variety of formats and even allows transparent retrieval of sequence data
from the web. Also, as extensive libraries are provided with the package, it is a platform to
allow other scientists to develop and release software in true open source spirit. EMBOSS
also integrates a range of currently available packages and tools for sequence analysis into a
seamless whole. EMBOSS breaks the historical trend towards commercial software packages.
The EMBOSS suite to provides a comprehensive set of sequence analysis programs,
provides a set of core software libraries (AJAX and NUCLEUS), integrates other publicly
available packages, encourages the use of EMBOSS in sequence analysis training,
encourages developers elsewhere to use the EMBOSS libraries and supports all common
Unix platforms including Linux, Digital Unix, Irix and Solaris. Within EMBOSS over 150
programs (applications) can be found. These are just some of the areas covered including
sequence alignment, rapid database searching with sequence patterns, protein motif
identification, including domain analysis, EST analysis, nucleotide sequence pattern analysis,
for example to identify CpG islands., simple and species-specific repeat identification, codon
usage analysis for small genomes, rapid identification of sequence patterns in large scale
sequence sets, presentation tools for publication and much more.
Conclusion
At the end of the experiment, the students were able to perform pair wise alignment by
comparing two molecular sequences (protein/nucleotide) using EMBOSS ALIGN tool. The
students were analyzed the important parameter affecting the alignment such as density of
exact matches (vertical line) and similarity (colon and single dot), percentage of identity,
similarity and gaps.
Questions
1. What do you mean by global alignment? What is the algorithm required to find the
global alignment of the sequences?
Sequence comparison along the entire length of two sequences being aligned.
Best for highly similar sequences of similar length
As the degree of sequence similarity declines, global alignment methods tend to miss
relationships
Needleman & Wunch algorithm
2. What do you mean by local alignment? What is the algorithm required to find the local
alignment of the sequences?
Sequence comparison intended to find the most similar regions in two sequences being
aligned
Regions outside the area of local alignment are excluded
More than one local alignment could be generated
Best for sequences that share some similarity or for sequences of different lengths
Waterman & Smith Method
3. What are the major matrices used for global alignment?
Dot-matrix methods
Dynamic programming
Word methods
4. What are the important parameters affecting the quality of pair wise alignment?
Scoring scheme :
Identity (Mismatch)
Similarity (match)
Gaps (indel)
5. What do you mean by dynamic programming?
A general algorithm design technique for solving problem defined by or formulated as
recurrences with overlapping sub instances. Invented by American mathematician Richard
Bellman in the 1950s to solve optimization problems.
Set up a recurrence relating a solution to a larger instances to solutions of some smaller
instances
Solve smaller instances once
Record solutions in a table
Extract solution to the initial instances from that table
References
Bayih. (2015). Dynamic programming. Retrieved from
[Link]
EMBOSS Needle < Pairwise Sequence Alignment < EMBL-EBI. (2018). Retrieved from
[Link]
Protein BLAST: search protein databases using a protein query. (2018). Retrieved from
[Link]
PROGRAM=blastp&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome
William, G. (2003). What is EMBOSS?. Retrieved from
[Link]