See
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/236630302
Bioinformatics Tools
Chapter December 2012
CITATIONS READS
0 1,567
2 authors:
Dr. Kunwar Singh Vaisla Jagmohan Rana
Uttarakhand Technical University Hemwati Nandan Bahuguna Garhwal University
60 PUBLICATIONS 85 CITATIONS 30 PUBLICATIONS 283 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
My Papers View project
All content following this page was uploaded by Dr. Kunwar Singh Vaisla on 01 June 2014.
The user has requested enhancement of the downloaded file.
LkR;eso t;rs
mkjk[k.M 'kklu
! " # $ % & ' & ' ( ) ' & * # + # , " - . / # ' & * # + # , " 0
# 2 ' ! 3 ' # $ 4 - ! - 5 * - .
1
BIOINFORMATICS
Tools & Applications
Editor-in-Chief
Dr. J.M.S Rana, Ph.D.
Director,
Uttarakhand State Biotechnology Department,
Government of Uttarakhand
LkR;eso t;rs
mkjk[k.M 'kklu
Uttarakhand State Biotechnology Department
(Ministry of Science & Technology and Biotechnology)
Government of Uttarakhand
2012
Edi t or :
Dr. Kunwar Singh Vaisla, Ph.D.
Associate Professor,
Department of Computer Science & Engineering
BT Kumaon Institute of Technology
Dwarahat 263653, District Almora (Uttarakhand)
[email protected]
Co-Edi t or :
Mr. Rajendra Bharti,
Associate Professor,
Department of Computer Science & Engineering
BT Kumaon Institute of Technology
Dwarahat 263653, District Almora (Uttarakhand)
BIOINFORMATICS - Tools & Applications
ISBN: 978-81-923296-3-5
copyright 2012, Rana
All rights are reserved. Utilization of this publication, other than stated, is prohibited
without prior written permission of the editor-in-chief.
Disclaimer :
This document may be freely reviewed, abstracted, reproduced and translated in part
or in whole, but not for sale of any commercial purpose. All reasonable precautions
have been taken by editorial board and authors to verify the information contained in
this publication. However, the responsibility for the interpretation of the material and
warranty of any kind, either expressed or implied, lies with the reader.
The document is being published by USBD for free of cost distribution among
trainees/participants/students.
A Joi nt Publ i cat i on of :
LkR;eso t;rs
mkjk[k.M 'kklu
Uttarakhand State Biotechnology Department(Ministry of Science &
Technology and Biotechnology), Government of Uttarakhand.
And
Department of Computer Science & Engineering
BT Kumaon Institute of Technology
Dwarahat 263653, District Almora (Uttarakhand)
Printed by:
Charu Printers, Dehradun
Phone : 0135-2727591
(Dr. C. Raman Suri)
Chief Scientist
Head-Nanobiotechnology
PREFACE
The unprecedented growth of computer science and
biotechnology has rapidly spawned specialized research groups and
industries, both. Bioinformatics, generally relates to biological
molecules and therefore requires knowledge and understanding in the
field of biochemistry, molecular biology, biophysics, mathematical
modeling, to name a few. Thus, Bioinformatics is a multi-disciplinary
field with an appropriate convergence of computer science, information
technology, biotechnology etc. encompassing analysis and
interpretation of biological data, modeling of biological processes and
development of related algorithms and statistics. Bioinformatics aims to
solve some of the important biological problems using computational
techniques and applications. Prominent among such examples include
those to gene-protein interactions, various protein structure prediction,
molecular networks, structures and sequences, and genomics and
proteomics. The knowledge of bioinformatics helps us tremendously in
analysis and interpretation of various types of biological data including
nucleotide and amino-acid sequences, protein domains /folding and
protein structures etc. It is the knowledge of bioinformatics which made
it possible to discover all human genes (per human genome
approximately 3 billion base pairs) determine complete sequence of 3
billon DNA subunits and make them accessible for further biological
studies. Thus, the research in this field requires close collaboration
among multi-disciplinary teams of researchers in computer science,
statistics, physics, engineering, life sciences medical sciences, and their
interfaces.
This compendium Bioinformatics: Tools and Applications
comprises of 12 articles chiefly introductory in nature, however with
insight to some important research challenges. The articles have been
selected on the basis of their fundamentality rather than the
rigorousness /thoroughness of techniques deployed. The compendium
represent the output of the lectures delivered in the Uttarakhand State
Biotechnology Department (USBD), sponsored short term training
program Bioinformatics tools and its applications Organized at B.T.
Kumaon Institute of Technology, Dwarhat(Almora) during 16 28 July
2012.
We are indebted to Hon'ble Chief Minister/ Chairman
Uttarakhand Biotechnology Board and Hon'ble Minister (Science &
Technology and Biotechnology) /vice Chairman Uttarakhand
Biotechnology Board, Govt. of Uttarakhand for their interest, care and
blessings for overall development of Biotechnology in the State. We also
extend our sincere thanks to Secretary and additional Secretary,
Ministry of Science & Technology and Biotechnology, Govt. of
Uttarakhand for their valuable guidance and support.
We are highly thankful to all the learned resource persons of the
training program for their great work and interest to provide the articles
for this compendium. I would like to place on record my sincere
gratitude and obligations to Mr Surendra Singh Negi, Honb'le Cabinet
Minister, Science & Technology and Biotechnology, Government of
Uttarakhand for sending us his valuable Message for the Compendium.
I need to express my heartfelt thanks, indeed gratefulness, to Dr C.
Raman Suri, Chief Scientist & Head Nanobiotechnology Division, CSIR
- Institute of Microbial Technology, Chandigarh for his unflinching
cooperation and for very kindly writing the Foreword of this
compendium. We are highly obliged and grateful to Reviewers of this
compendium for critically reviewing the manuscript.
Finally I wish to express my Special thanks to Dr. Kunwar Singh
Vaisla and Dr Rajendra Kumar Bharati, BT Kumaun Institute of
Technology, Dwarhat(Almora) for their competent and comprehensive
editorial task. The views expressed in the compendium are solemnly
those of the authors. Despite the careful efforts in compilation of the
compendium, remaining errors may kindly be pardoned and brought to
the notice for further improvement.
We hope the book will be useful to students and academicians
working in the field of Bioinformatics, Biotechnology, and other related
fields of studies.
(J.M.S.Rana)
INDEX
Ch Topic Page No
No.
1 Message 3
2 Forward 5
3 Preface 7
4 Introduction To Bioinformatics 11
Jag Mohan Singh Rana & Kunwar Singh Vaisla
5 Introduction Of Molecular Biology 19
T. Kalaivani & C. Rajasekaran
6 Nucleic Acid And Protein Sequences 28
I. Arnold Emerson & C. Immanuel Selvaraj
Biological And Bioinformatics 34
7 Databases And Data Mining
A. K. Mishra
8 Mathematical Modeling & Simulation 58
Sanjay Kumar Agarwal
Sequence Analysis And Computational 68
9 Models For Biological Data
C. Immanuel Selvaraj & I. Arnold Emerson
10 Phylogenetic Analysis 74
Kumar Sachin
11 Sequence Analysis Languages And Tools 94
C. K. Jain
12 Proteomics 113
C. Rajasekaran & T. Kalaivani
13 DNA Representation 134
Ms. Archana Verma
14 Biological Sequence Compression 143
Mr. Rajendra Bharti
15 Bioinformatics Tools 158
Kunwar Singh Vaisla & Jag M ohan Singh Rana
16 About Authors 176
Bioinformatics Tools
Kunwar Singh Vaisla & Jag Mohan Singh Rana
1. BLAST
The Basic Local Alignment Search Tool (BLAST) finds regions of local
similarity between sequences. The program compares nucleotide or
protein sequences to sequence databases and calculates the statistical
significance of matches. BLAST can be used to infer functional and
evolutionary relationships between sequences as well as help identify
members of gene families.
Examples of BLAST usage
BLAST can be used for a lot of different purposes. A few of them are
mentioned below.
Looking for species. If you are sequencing DNA from unknown
species, BLAST may help identify the correct species or
homologous species.
Looking for domains. If you BLAST a protein sequence (or a
translated nucleotide sequence) BLAST will look for known
domains in the query sequence.
Looking at phylogeny. You can use the BLAST web pages to
generate a phylogenetic tree of the BLAST result.
Mapping DNA to a known chromosome. If you are sequencing a
gene from a known species but have no idea of the chromosome
location, BLAST can help you. BLAST will show you the position
of the query sequence in relation to the hit sequences.
Annotations. BLAST can also be used to map annotations from
one organism to another or look for common genes in two related
species.
Searching for homology
Most research projects involving sequencing of either DNA or protein
have a requirement for obtaining biological information of the newly
sequenced and maybe unknown sequence. If the researchers have no
prior information of the sequence and biological content, valuable
information can often be obtained using BLAST. The BLAST algorithm
will search for homologous sequences in predefined and annotated
databases of the users choice. In an easy and fast way the researcher
can gain knowledge of gene or protein function and find evolutionary
relations between the newly sequenced DNA and well established data.
After the BLAST search the user will receive a report specifying found
homologous sequences and their local alignments to the query
sequence.
158 Bioinformatics Tools & Its Applications
How does BLAST work?
BLAST identifies homologous sequences using a heuristic method
which initially finds short matches between two sequences; thus, the
method does not take the entire sequence space into account. After
initial match, BLAST attempts to start local alignments from these
initial matches. This also means that BLAST does not guarantee the
optimal alignment, thus some sequence hits may be missed. In order to
find optimal alignments, the Smith-Waterman algorithm should be
used (see below). In the following, the BLAST algorithm is described in
more detail.
Seeding
When finding a match between a query sequence and a hit sequence,
the starting point is the words that the two sequences have in common.
A word is simply defined as a number of letters. For blastp the default
word size is 3 W=3. If a query sequence has a QWRTG, the searched
words are QWR, WRT, RTG. See figure 1 for an illustration of words in a
protein sequence.
Figure 1: Generation of exact BLAST words with a word size of W = 3
During the initial BLAST seeding, the algorithm finds all common words
between the query sequence and the hit sequence(s). Only regions with
a word hit will be used to build on an alignment. BLAST will start out by
making words for the entire query sequence (see figure 1). For each word
in the query sequence, a compilation of neighborhood words, which
exceed the threshold of T, is also generated.
A neighborhood word is a word obtaining a score of at least T when
comparing, using a selected scoring matrix (see figure 2). The default
scoring matrix for blastp is BLOSUM62. The compilation of exact words
and neighborhood words is then used to match against the database
sequences.
Bioinformatics Tools & Its Applications 159
Figure 2: Neighborhood BLAST words based on the BLOSUM62 matrix. Only words
where the threshold T exceeds 13 are included in the initial seeding.
After initial finding of words (seeding), the BLAST algorithm will extend
the (only 3 residues long) alignment in both directions (see figure 3).
Each time the alignment is extended, an alignment score is
increases/decreased. When the alignment score drops below a
predefined threshold, the extension of the alignment stops. This
ensures that the alignment is not extended to regions where only very
poor alignment between the query and hit sequence is possible. If the
obtained alignment receives a score above a certain threshold, it will be
included in the final BLAST result.
Figure 3: Blast aligning in both directions. The initial word match is marked green.
By tweaking the word size W and the neighborhood word threshold T, it
is possible to limit the search space. E.g. by increasing T, the number of
neighboring words will drop and thus limit the search space as shown in
figure 4.
Figure 4: Each dot represents a word match. Increasing the threshold of T limits the
search space significantly.
160 Bioinformatics Tools & Its Applications
This will increase the speed of BLAST significantly but may result in loss
of sensitivity. Increasing the word size W will also increase the speed
but again with a loss of sensitivity.
How to use BLAST:
The Advance BLAST page has many parameters which you can adjust,
and the outcome of a BLAST search will depend on the parameters you
used.
A) Types of BLAST programs
There are five different blast programs, which can be distinguished by
the type of the query sequence (DNA or protein) and the type of the
subject database:
BLASTP compares an amino acid query sequence against a protein
sequence database;
BLASTN compares a nucleotide query sequence against a nucleotide
sequence database;
BLASTX compares the six-frame conceptual translation products of
a nucleotide query sequence (both strands) against a protein
sequence database;
TBLASTN compares a protein query sequence against a nucleotide
sequence database dynamically translated in all six reading frames
(both strands).
TBLASTX compares the six-frame translations of a nucleotide query
sequence against the six-frame translations of a nucleotide
sequence database.
B) Subject Databases
There are many databases to use as subject databases. One of the most
commonly used is nr database: collection of "non-redundant"
sequences from GenBank and other sequence databanks.
C) Sequence input
BLAST accept the sequence in FASTA format or Accession Number (GI
number).
D) Parameters to adjust
EXPECT value: The statistical significance threshold for reporting
matches against database sequences; the default value is 10, such that
10 matches are expected to be found merely by chance. If the statistical
significance ascribed to a match is greater than the EXPECT threshold,
the match will not be reported. Increasing the EXPECT value forces the
program to report less isgnificant matches. FILTER (Low-complexity):
Mask off segments of the query sequence that have low compositional
complexity (i.e. regions of biased composition, such as short-period
repeats)
Bioinformatics Tools & Its Applications 161
2. FASTA
FASTA (pronounced fast-ay) is a heuristic for finding significant
matches between a query string q and a database string d. FASTAs
general strategy is to find the most significant diagonals in the dot-plot
or dynamic programming matrix. The performance of the algorithm is
influenced by a word-size parameter k, usually 6 for DNA and 2 for
amino acids. The algorithm consists of four phases as follows
3. ClustalW
Multiple alignments of protein sequences are important tools in
studying sequences. The basic information they provide is identification
of conserved sequence regions. This is very useful in designing
experiments to test and modify the function of specific proteins, in
predicting the function and structure of proteins, and in identifying new
members of protein families. Sequences can be aligned across their
entire length (global alignment) or only in certain regions (local
alignment). This is true for pairwise and multiple alignments. Global
alignments need to use gaps (representing insertions/deletions) while
local alignments can avoid them, aligning regions between gaps.
ClustalW2 is a fully automatic program for global multiple alignment of
DNA and protein sequences. The alignment is progressive and considers
the sequence redundancy. Trees can also be calculated from multiple
alignments. The program has some adjustable parameters with
reasonable defaults. ClustalW is a general purpose global multiple
sequence alignment program for DNA or proteins. It produces
biologically meaningful multiple sequence alignments of divergent
sequences. It calculates the best match for the selected sequences, and
162 Bioinformatics Tools & Its Applications
lines them up so that the identities, similarities and differences can be
seen. Evolutionary relationships can be seen via viewing Cladograms or
Phylograms.
Multiple alignments of protein sequences are important tools in
studying sequences. The basic information they provide is identification
of conserved sequence regions. This is very useful in designing
experiments to test and modify the function of specific proteins, in
predicting the function and structure of proteins, and in identifying new
members of protein families.
Sequences can be aligned across their entire length (global alignment)
or only in certain regions (local alignment). This is true for pairwise and
multiple alignments. Global alignments need to use gaps (representing
insertions/deletions) while local alignments can avoid them, aligning
regions between gaps. ClustalW2 is a fully automatic program for global
multiple alignment of DNA and protein sequences. The alignment is
progressive and considers the sequence redundancy. Trees can also be
calculated from multiple alignments. The program has some adjustable
features. ClustalW starts by finding the score of the pairwise alignment
between each pair of sequences, using a scoring function that is
appropriate for proteins. For example, since the core of a protein has
less insertions and deletions, and hydrophobic regions are more likely
than hydrophillic regions to be in the core, the scoring function has a
lower gap penalty in hydorphillic regions those in hydrophobic regions.
Now that it has all the scores, ClustalW can construct a tree by merging
pairs of sequences with a higher alignment score before merging pairs
with a lower score. However, it is not always correct to do this, as the two
nodes with the highest alignment score, and thus the smallest distance
in the correct tree, may still be merged with other nodes in the correct
tree before they are merged with each other.
4. RASMOL
RasMol is a molecular graphics program intended for the visualisation
of proteins, nucleic acids and small molecules. The program is aimed at
display, teaching and generation of publication quality images. RasMol
runs on wide range of architectures and operating systems including
Microsoft Windows, Apple Macintosh, UNIX and VMS systems. UNIX
and VMS versions require an 8, 24 or 32 bit colour X Windows display
(X11R4 or later). The X Windows version of RasMol provides optional
support for a hardware dials box and accelerated shared memory
communication (via the XInput and MIT-SHM extensions) if available on
the current X Server. The program reads in a molecule coordinate file
and interactively displays the molecule on the screen in a variety of
colour schemes and molecule representations. Currently available
representations include depth-cued wireframes, 'Dreiding' sticks,
spacefilling (CPK) spheres, ball and stick, solid and strand biomolecular
ribbons, atom labels and dot surfaces. Up to 5 molecules may be loaded
and displayed at once. Any one or all of the molecules may be rotated
Bioinformatics Tools & Its Applications 163
and translated. The X Windows version of RasMol provides optional
support for a hardware dials box and accelerated shared memory
communication (via the XInput and MIT-SHM extensions) if available on
the current X Server. The program reads in molecular coordinate files
and interactively displays the molecule on the screen in a variety of
representations and colour schemes. Supported input file formats
include Protein Data Bank (PDB), Tripos Associates' Alchemy and Sybyl
Mol2 formats, Molecular Design Limited's (MDL) Mol file format,
Minnesota Supercomputer Center's (MSC) XYZ (XMol) format,
CHARMm format, CIF format and mmCIF format files. If connectivity
information is not contained in the file this is calculated automatically.
The loaded molecule can be shown as wireframe bonds, cylinder
'Dreiding' stick bonds, alpha-carbon trace, space-filling (CPK) spheres,
macromolecular ribbons (either smooth shaded solid ribbons or parallel
strands), hydrogen bonding and dot surface representations. Atoms
may also be labelled with arbitrary text strings. Alternate conformers
and multiple NMR models may be specially coloured and identified in
atom labels. Different parts of the molecule may be represented and
coloured independently of the rest of the molecule or displayed in
several representations simultaneously. The displayed molecule may be
rotated, translated, zoomed and z-clipped (slabbed) interactively using
either the mouse, the scroll bars, the command line or an attached dial
box. RasMol can read a prepared list of commands from a 'script' file (or
via inter-process communication) to allow a given image or viewpoint to
be restored quickly. RasMol can also create a script file containing the
commands required to regenerate the current image. Finally, the
rendered image may be written out in a variety of formats including
either raster or vector PostScript, GIF, PPM, BMP, PICT, Sun rasterfile
or as a MolScript input script or Kinemage. The RasMol help facility can
be accessed by typing "help <topic>" or "help <topic> <subtopic>" from
the command line. A complete list of RasMol commands may be
displayed by typing "help commands". A single question mark may also
be used to abbreviate the keyword "help". Please type "help notices" for
important notices.
5. Swiss-PdbViewer
Swiss-PdbViewer is an application that provides a user friendly
interface allowing to analyze several proteins at the same time. The
proteins can be superimposed in order to deduce structural alignments
and compare their active sites or any other relevant parts. Amino acid
mutations, H-bonds, angles and distances between atoms are easy to
obtain thanks to the intuitive graphic and menu interface. Moreover,
Swiss-PdbViewer is tightly linked to Swiss-Model, an automated
homology modeling server developed within the Swiss Institute of
Bioinformatics (SIB) in collaboration between GlaxoSmithKline R&D
and the Structural Bioinformatics Group at the Biozentrum in Basel.
Working with these two programs greatly reduces the amount of work
necessary to generate models, as it is possible to thread a protein
164 Bioinformatics Tools & Its Applications
primary sequence onto a 3D template and get an immediate feedback of
how well the threaded protein will be accepted by the reference
structure before submitting a request to build missing loops and refine
sidechain packing.
Swiss-PdbViewer can also read electron density maps, and provides
various tools to build into the density. In addition, various modeling
tools are integrated and command files for popular energy minimization
packages can be generated.
6. Other Bioinformatics Tools
Molecular Sequence Alignment Tools
A sequence alignment is a schematic arrangement of one sequence of
DNA, RNA and protein sequences on top of another where the residues
in one position are entitled to have a common evolutionary origin. This
method is used to identify regions of similarity that may be a
consequence of functional, structural, or evolutionary relationships
between two or more sequences.
LALIGN - It finds multiple matching sub-segments in two
sequences. It provides or assigns one with % identity for different
s u b - s e g m e n t s o f t h e s e q u e n c e .
http://www.ch.embnet.org/software/LALIGN_form.html
GraphAlin - It presents the output file in graphical and
numerical form of % identity between two proteins, or RNA or
DNA molecules. (darwin.nmsu.edu/cgi-bin/graph_align.cgi)
GeneOrder 2.0 It is an ideal tool for the alignment of
small GenBank genome sequences (upto 0.25Mb). It has a new
version as GeneOrder 3.0. There are two display formats:
g r a p h i c a l a n d t a b u l a r .
(http://binf.gmu.edu:8080/GeneOrder2.0/)
CoreGenes - It is designed to analyze two to five genomes
simultaneously, it also generates a table of related genes i.e.
orthologs and putative orthologs. It has a limit of 0.35 Mb. It has
a n u p d a t e d v e r s i o n C o r e G e n e s 2 . 0
(http://binf.gmu.edu:8080/CoreGenes1.0/)
WebACT - This is the web version of ACT (Artemis Comparison
Tool) a DNA sequence comparison viewer based on Artemis.
(http://www.webact.org/WebACT/generate)
BASys It is known as Bacterial Annotation Tool. It is far-fetched
tool which supports automated and in-depth annotation of
b a c t e r i a l g e n o m i c s e q u e n c e s .
(http://basys.ca/basys/cgi/submit.pl )
ORF Works very well and quickly with phage-sized genomes.
It also offers a choice of Glimmer, ZCurve or GeneMark
Bioinformatics Tools & Its Applications 165
predictions coupled with GenBank or Fasta-formatted output.
(http://bioinformatics.biol.rug.nl/websoftware/orf/orf_start.p
hp)
MICheck - It stands for MIcrobial genome Checker. It enables
rapid verification of sets of annotated genes and frameshifts in
previously published bacterial genomes, or genomes for which
the user has a *.gbk file. (http://www.genoscope.cns.fr/agc/
tools/micheck/Form/form.php)
LTR_Finder - It is an efficient program. Used for finding full-
length LTR retrotranspsons in genome sequences. The input file
size is limited to 50MB ( http://tlife.fudan.edu.cn/ltr_finder/)
ToPLign - To access this tool one requires having user login. For
pairwise alignment it requires a variety of output formats. (
http://www.biosolveit.de/software/)
SUPERMATCHER - It is an inbuilt part of the EMBOSS group of
programs. This tool uses 10 and 0.5 as the default values for Gap
opening and Gap extension penalty, respectively.
( h t t p : / / m o b y l e . p a s t e u r . f r / c g i -
bin/portal.py?#forms::supermatcher)
MATCHER - It is also a part of EMBOSS. It finds the best local
alignments between two the sequences.
(http://mobyle.pasteur.fr/cgi-bin/portal.py?#forms::matcher)
* Picture - It is a DNA or genome alignment and visualization tool
based on blastz alignment program. Alignments can be
automatically submitted to rVista 2.0 to identify evolutionary
conserved transcription factor binding sites.
(http://zpicture.dcode.org/)
ClustalW - It is a Multiple Sequence Alignment search tool. It
provides one with a number of options for data presentation,
homology matrices and presentation of phylogenetic trees.
(http://www.ebi.ac.uk/Tools/msa/clustalw2/)
DbClustal - DbCluster aligns sequences from a BlastP database
search with one query sequence. The alignment algorithm is
based on ClustalW modified to incorporate local alignment data
in the form of anchor points between pairs of sequences.
(http://bips.ustrasbg.fr/PipeAlign/jump_to.cgi?DbClustal+noi
d)
PROBCONS - A novel tool for generating multiple alignments of
protein sequences. Using a combination of probabilistic
modeling and consistency-based alignment techniques,
PROBCONS has achieved the highest accuracies of all alignment
methods to date. (http://probcons.stanford.edu/index.html)
PRALINE - A multiple sequence alignment program with many
166 Bioinformatics Tools & Its Applications
options to optimize the information for each of the input
sequences; (http://www.ibi.vu.nl/programs/pralinewww/) *
DiAlign - DIALIGN is a novel program for multiple alignments
developed by Burkhard Morgenstern et al. While standard
alignment methods rely on comparing single residues and
imposing gap penalties, DIALIGN constructs pairwise and
multiple alignments by comparing whole segments of the
sequences. (http://bibiserv.techfak.unibielefeld.de/dialign/
submission.html)
T-COFFEE - T-COFFEE is more accurate than ClustalW for
sequences with less than 30% identity. The Output in aln,
GCG/MSF and phylip formats. (http://tcoffee.vital-it.ch/cgi-
bin/Tcoffee/tcoffee_cgi/index.cgi)
MARNA - Stands for Multiple Alignments of RNAs. A multiple
sequence alignment of RNAs taking into consideration both the
primary sequence and the secondary structure.
(www.bio.inf.uni-jena.de/Software/MARNA/)
SCAN2 - It provides one with a color-coded graphical alignment
of genome length DNAs in Java. In the top panel regions of high
sequence identity are presented in red.
(http://linux1.softberry.com/berry.phtml?
topic=index&group=programs&subgroup=scanh)
JDotter - A Java Dot Plot Viewer - a dot matrix plotter for Java.
Produces similar diagrams to the above mentioned programs,
but with better control on output. (http://athena.bioc.uvic.ca/
workbench.php? tool= jdotter&db=)
YASS - Perform DNA local alignments with results in dotplot and
tabular form http://bioinfo.lifl.fr/yass/yass.php
Dotlet - http://myhits.isb-sib.ch/cgi-bin/dotlet)
multi-zPicture - Provides nice dotplot graphs and dynamic
visualization on multiple sequence alignment. zPicture
alignments can be automatically submitted to rVista to identify
conserved transcription factor binding sites.
(http://zpicture.dcode.org/)
CoreGenes 3 - Tallies the total number of genes in common
between the two genomes being compared; displays the percent
value of genes in common with a specific genome; determines the
unique genes contained in a pair of proteomes.
(http://binf.gmu.edu:8080/CoreGenes3.0/)
PipeAlign - Offers an integrated approach to protein family
analysis through a cascade of five different sequence analysis
programs. Reference: F. Plewniak et al. 2003. Nucleic Acids
Research, 31: 3829-3832. (http://bips.u-strasbg.fr/PipeAlign/)
Bioinformatics Tools & Its Applications 167
EMB BLAST - Very convenient since it permits one to
specifically search databases such as prokaryote,
bacteriophage, fungal, & 16S rRNA using BLASTN, and specific
bacterial genomes or SwissProt using BLASTX or
BLASTN.http://www.ch.embnet.org/software/aBLAST.html)
ParAlign - Employs a heuristic method for sequence alignment.
In essence, ParAlign is about as sensitive as Smith-Waterman
but runs at the speed of BLAST. It presents nice graphics.
(http://www.paralign.org/)
VISTA - VISualization Tools for Alignments - this URL allows one
to align two genome-length sequences.
(http://genome.lbl.gov/vista/mvista/submit.shtml)
Batch BLAST - This tool was developed by was developed by
Michael V. Graves for DNA or protein BLAST sequence analysis
against the NCBI databases. It allows one to submit a file that
contains multiple sequences and then will organize the results
by each individual sequence contained in the file.
(http://greengene.uml.edu/programs/NCBI_Blast.html)
PSI-BLAST or PHI-BLAST -Stands for position-Specific Iterative
BLAST and creates a profile after the initial search. This is used
f o r s u b s e q u e n t s e a r c h e s .
(http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
FFAS03 - It is used for pairwise alignment. Presentation of result
file is very nice but server can be very slow. (Reference:
Jaroszewski et al. 2005. Nucl. Acids Res. 33: W284-W288).
(http://ffas.ljcrf.edu/ffas-cgi/cgi/pair_aln.pl?ses=)
RNA Structure Prediction Tools
The study of RNA structure has developed a distinct set of
computational tools designed explicitly for RNA applications.
Frequently, different regions of the same RNA strands fold together via
base pair interactions to make complicated secondary and tertiary
structures that are essential for different biological function.
MirAlign: MiRAlign is designed to find new miRNAs using both
sequence information and structural characteristics of known
miRNAs. It detects new miRNAs based on both sequence and
s t r u c t u r e a l i g n m e n t .
(http://bioinfo.au.tsinghua.edu.cn/miralign/)
ProMir: ProMiR (ver 1.0): Probabilistic co-learning model for
miRNA gene finding. It combines both sequential and structural
characteristics for miRNA genes in a probabilistic framework,
and simultaneously decides both whether or not a miRNA gene is
present. (http://bi.snu.ac.kr/ProMiR/download.htm)
Mirscan: Mirscan is intended to identify the most microRNA-like
168 Bioinformatics Tools & Its Applications
hairpin sequences from an arbitrary set of candidates.
(http://genes.mit.edu/mirscan/)
ProMir: ProMiR (ver 1.0): Probabilistic co-learning model for
miRNA gene finding. It combines both sequential and structural
characteristics for miRNA genes in a probabilistic framework,
and simultaneously decides both whether or not a miRNA gene is
present. (http://bi.snu.ac.kr/ProMiR/download.htm)
ERPIN (Easy RNA Profile IdentificatioN): ERPIN is a
software program for RNA motif identification. ERPIN reads a
sequence alignement and secondary structure, and
automatically infers a statistical "secondary structure profile"
(SSP). An original Dynamic Programming algorithm then
matches this SSP onto any target database, finding solutions
and their associated scores. (http://rna.igmors.u-
psud.fr/download/) * MiPred: It classifies the real and pseudo
microRNA precursors using random forest prediction model with
combined features.
RNAmicro: It recognizes the miRNA precursors in Comparative
Genomics Data. It is available as a standalone program and as
w e b b a s e d p r o g r a m .
(http://www.tbi.univie.ac.at/~jana/software/RNAmicro.html)
Mireval: It is a comprehensive tool, easy to use and very
informative. (http://tagc.univ-mrs.fr/mireval/)
MIRFINDER: A computational pipeline called MIRFINDER
identifies conservedhairpin structures in the genomes of A.
thaliana and Oryza sativa and subsequently applies several
filters, based on core features derived from known miRNAs.
(http://www.bioinformatics.org/mirfinder/)
mfold software: mfold for RNA folding was developed in the late
1980s. It is a web server for nucleic acid folding and
hybridization prediction. The core algorithm predicts a
minimum free energy ?G as well as free energies for folding that
must contain any particular base-pair.
(http://mobyle.pasteur.fr/cgi-bin/portal.py?#forms::mfold)
RNAsoft: RNAsoft is a collection of online services for the
computational prediction and design of RNA/DNA structure.
Three programs are currently available in RNAsoft are Pair fold,
comb fold and RNA designer. (http://www.rnasoft.ca/)
RNAz: RNAz is used for predicting structurally conserved and
thermodynamically stable RNA secondary structure in multiple
sequence alignment. It makes use of the support vector machine
classification procedure, which estimates a class-probability
which can be convenient overall score.
(http://rna.tbi.univie.ac.at/cgi-bin/RNAz.cgi)
Bioinformatics Tools & Its Applications 169
VIENNA RNA package: Vienna RNA package has C code library
and many stand alone programs for the prediction and
comparison of RNA secondary structures.The mostly used
function in the package is RNA secondary structure prediction
t h r o u g h e n e r g y m i n i m i z a t i o n .
(http://www.tbi.univie.ac.at/~ivo/RNA/)
XRNA: XRNA is secondary structure display software. XRNA is a
Java based collection of tools for the creation, annotation and
display of RNA secondary structure diagrams.
(http://rna.ucsc.edu/rnacenter/xrna/)
CONTRAFOLD: This method is based on conditional long linear
models. It integrates most of the thermodynamic features in
models and gives highest single sequence accuracy values till
now. (http://contra.stanford.edu/contralign/server.html)
RNAshapes: It creates a map of the structure in a tree like
domain. It delivers thermodynamically best structure for each
sequence which has common shape.
( h t t p : / / b i b i s e r v . t e c h f a k . u n i -
bielefeld.de/rnashapes/submission.html)
o Alifold: Alifold predicts consensus secondary structures for the
set of aligned RNA and DNA sequences. It is offered by the Vienna
RNA package. (http://rna.tbi.univie.ac.at/cgi-
bin/RNAalifold.cgi)
Evofold: EvoFold is a comparative method for identifying
functional RNA structures in multiple-sequence alignments. It is
based on a probabilistic model-construction called phylo-SCFG
and exploits the characteristic differences of the substitution
process in stem-pairing and unpaired regions to make its
predictions. (http://users.soe.ucsc.edu/~jsp/EvoFold/)
Xrate: Xrate is an interpreter for phylo-grammars (it is a
stochastic grammar whose terminals are alignment columns,
typically generated by evolving a continuous-time Markov
p r o c e s s o n a p h y l o g e n e t i c t r e e ) .
(http://biowiki.org/XrateSoftware)
MSARi: MSARi is a program for detecting conservation of RNA
secondary structure. It searches orthologous nucleotide
sequences for statistically significant variations conserving a
c a n d i d a t e s e c o n d a r y s t r u c t u r e .
(http://groups.csail.mit.edu/cb/MSARi/)
DEQOR: This program uses a scoring matrix, which is based on
the parameters of the siRNA design for evaluating the inhibitory
potency of siRNAs. (http://deqor.mpi-
cbg.de/deqor_new/input.html)
BLOCK-It RNAi designer: This program design synthetic siRNA,
170 Bioinformatics Tools & Its Applications
stealth RNA and shRNA. It utilizes a highly effective, proprietary
algorithm and can be used for efficiently designing different
k i n d s o f R N A i m o l e c u l e s .
(https://rnaidesigner.invitrogen.com/rnaiexpress/)
SFold: Sfold is based on patent-pending algorithm for RNA
folding, target accessibility prediction, and rational design of
RNA targeting nucleic acids. (http://sfold.wadsworth.org/cgi-
bin/sirna.pl)
Dharmacon siRNA Design Center: The siDESIGN Center is an
advanced, user-friendly siRNA design tool, which significantly
improves the likelihood of identifying functional siRNA.
(http://www.dharmacon.com/designcenter/designcenterpage.
aspx)
ERPIN (Easy RNA Profile Identification): It is program for RNA
motif search and it does not require users to write complex
descriptors before starting of a program and instead of that it
reads a sequence alignment and secondary structure and draws
an inference of statistical secondary structure profile.
(http://rna.igmors.u-psud.fr/download/)
INFERNAL: It is a program which search DNA database for RNA
structure and sequence similarities. It is implements a
covariance model, which is a special type of stochastic context
free grammar. (http://infernal.janelia.org/)
PHMMTS: It is an extended version of pair hidden markov
model. This program does a local alignment on an unfolded
sequence by a folded structure.
(http://phmmts.dna.bio.keio.ac.jp/search.html)
Research: This program searches a database for homologous
RNAs. It uses SCFG algorithm to score secondary as well as
primary sequence alignment.
RaveNnA: This software speeds up the CM (covariance model) by
implementing various techniques. The technique is ML-
heuristic where a heuristic method is used to speed up the CM.
(http://bliss.biology.yale.edu/~zasha/ravenna/)
Protein Structure Prediction Tools
Protein structure prediction is the most important method in the area of
developing science. It is also known as the holy grail of modern biology.
It helps in the prediction of the three-dimensional structure of a protein
from its amino acid sequence i.e. the prediction of its secondary,
tertiary, and quaternary structure from its primary structure.
MolSurfer: A Macromolecular Interface Navigator. It is a Java-
based program and can be used to study protein-protein and
protein-DNA/RNA interfaces. The 2D projections of the
Bioinformatics Tools & Its Applications 171
computed interface aid visualization of complicated interfacial
geometries in 3D. (http://projects.villa-
bosch.de/dbase/molsurfer/ )
InterPreTS prediction: It combines three-dimensional
information of protein complexes, from their database of
interacting domains, with an empirical scoring system to assess
the fit of any potential protein interacting pair with a known
three-dimensional structure. (Reference: P. Aloy & R.B. Russell
(2003) Bioinformatics. 19, 161-162). (http://speedy.embl-
heidelberg.de/people/patrick/interprets/index.html )
STRING: STRING stands for Search Tool for the Retrieval of
Interacting Genes/Proteins. It is associated with high-
throughput experimental data, mining databases and literature,
and from predictions based on genomic context analysis. It
assembles them in a common reference set, and presents
evidence in a consistent and intuitive web interface.
(http://string.embl.de)
YASPIN: It is built on three individual web servers: cons-PPISP,
PINUP, and Promate. It is known as the meta web server and is
used for protein-protein interaction and site prediction.
(http://www.ibi.vu.nl/programs/yaspinwww/ )
JPred: Based upon PHD, Predator, DSC, NNSSP, Zpred and
Mulpred programs It is a consensus method for protein
s e c o n d a r y s t r u c t u r e p r e d i c t i o n .
(http://www.compbio.dundee.ac.uk/www-jpred/ )
TMpred: Prediction of Trans-membrane Regions and Orientation
- I S R E C
(http://www.ch.embnet.org/software/TMPRED_form.html) *
TMHMM: Prediction of transmembrane helices in proteins
(http://www.cbs.dtu.dk/services/TMHMM-2.0/)
DAS: Transmembrane Prediction Server
(http://www.sbc.su.se/~miklos/DAS/ )
SPLIT: Transmembrane Protein Topology Prediction Server
provides modified hydrophobic moment index and clear, colorful
output including beta reference (http://split.pmfst.hr/split/4/ )
OCTOPUS: This tool uses a novel combination of hidden Markov
models and artificial neural networks. It predicts the correct
topology for 94% of the dataset of 124 sequences with known
structures. (http://octopus.cbr.su.se/)
TMRPres2D: Stands for TransMembrane protein Re-
Presentation in 2 Dimensions tool. It is a java based tool and
takes data from a variety of protein folding servers and creates
uniform, two-dimensional, high analysis graphical images/
models of alpha-helical or beta-barrel transmembrane proteins.
172 Bioinformatics Tools & Its Applications
(http://bioinformatics.biol.uoa.gr/TMRPres2D/)
PRED-TM: Based on Hidden Markov Model method and is
capable of predicting and discriminating beta-barrel outer
membrane proteins. Gives one the opportunity to download a
custom image plot or a 2D representation.
(http://bioinformatics.biol.uoa.gr/PRED-TMBB/)
TMB-Hunt: TransMembrane Barrel-Hunt and is based on amino
acid composition. It provides with a color-coded score (& Evalue)
for an individual or a series of proteins.
( h t t p : / / b m b p c u 3 6 . l e e d s . a c . u k /
~andy/betaBarrel/AACompPred/aaTMB_Hunt.cgi)
Coils: It helps in prediction of Coiled Coil Regions in Proteins
(http://www.ch.embnet.org/software/COILS_form.html)
3D-PSSM: A Fast, Web-based Method for protein fold
recognition. As an input it uses 1D and 3D sequence profiles
coupled with secondary structure and solvation potential
information. (http://www.sbg.bio.ic.ac.uk/~3dpssm/)
PHYRE: Protein Homology/analogY Recognition Engine. It helps
in the prediction of the 3D structure of proteins.
(http://www.sbg.bio.ic.ac.uk/~phyre/)
CPHModels: It consists of the following tools: Sowhat: A neural
network based method to predict contacts between C-alpha
atoms from the amino acid sequence. RedHom: A tool to find a
subset with low sequence similarity in a database. Databases:
Subsets of the Brookhaven Protein Data Bank (PDB) database
with low sequence similarity produced using the RedHom tool.
(http://www.cbs.dtu.dk/services/CPHmodels/)
SWISS-MODEL: An automated comparative protein modelling
server. It require a viewer such as DeepView - Swiss-PdbViewer,
Rasmol, Cn3D v3.0 or WebMol Java PDB.
(http://swissmodel.expasy.org/)
LOOPP: (Learning, Observing and Outputting Protein Patterns) A
fold recognition program based on the collection of numerous
signals, merging them into a single score, and generating atomic
coordinates based on an alignment into a homologue template
structure. (http://cbsuapps.tc.cornell.edu/loopp.aspx)
3D-JIGSAW: Used in homology modelling. Save email results as
*.pdb and view with Rasmol etc. (www.bmm.icnet.uk)
InterProSurf: It predicts interacting amino acid residues in
proteins that are most likely to interact with other proteins, given
the 3D structures of subunits of a protein complex.
(http://curie.utmb.edu/)
GeNMR: (GEnerate NMR structure) - generates 3D protein
Bioinformatics Tools & Its Applications 173
structures using NOE-derived distance restraints and NMR
chemical shifts. (http://www.genmr.ca/)
Microarray Analysis Tools
Microarray analysis is used in the interpretation of the data generated
from experiments on RNA, DNA and protein microarrays. It enables the
researchers to investigate the expression data of a large number of
genes in a great number of organism's with entire genome in a single
experiment. Microarrays are a significant and advance technique both
because they may contain a very large number of genes and are very
small size.
CIMminer: It creates (CIMs) i.e. color-coded Clustered Image
Maps also known as (heat maps). It is used to represent high-
dimensional data sets such as gene expression profiles. It was
introduced in mid-1990s for data on drug activity, target
expression, gene expression, and proteomic profiles.
(http://discover.nci.nih.gov/cimminer/)
SpliceMiner: This tool provides an intuitive of non-redundant
display of gene's splice variants and may be searched by gene
symbol, chromosomal position, or probe sequence. A high-
throughput interface is available for batch processing of large
numbers of queries. http://www.tigerteamconsulting.com/
SpliceMiner/intro.jsp/
MatchMiner: Translates among gene identifier types for lists of
hundreds or thousands of genes. MatchMiner can also find the
intersection of two lists of genes specified by different identifiers
(http://discover.nci.nih.gov/matchminer/indxe.jsp/ )
SmudgeMiner: It highlights regional biases and other artifacts
on Affymetrix and other microarrays to enable quality
assessment. (http://discover.nci.nih.gov/affytools/ )
AffyProbeMiner: AffyProbeMiner is to re-define chip definition
files (CDFs) for Affymetrix chips taking into account the most
recent genomic sequence information. Pre-computed CDFs for
several chips are available for download.
(http://gauss.dbb.georgetown.edu/liblab/affyprobeminer/ )
SpliceCenter: A user friendly tools for every bench biologist. It
helps to find out the impact of gene splice variation on common
molecular biology technologies including RT-PCR, RNAi,
expression microarrays, and peptide-based assays.
(http://www.tigerteamconsulting.com/SpliceCenter/SpliceOve
rview.jsp/ )
CellMiner: A database and query tools for molecular profile
information on the NCI 60 human cancer cell lines and the
DU145/RC0.1 prostate cancer cell line
174 Bioinformatics Tools & Its Applications
pair.(http://discover.nci.nih.gov/cellminer/ )
LeFEminer: Helps in the interpretation of gene microarray.
LeFEminer uses independently generated gene categories
defined by GO, KEGG or other analogous resource. Support for
LeFEminer's is supported by the NIH's Advanced Biomedical
Computing Facility (ABCC) for its intensive computational
requirements. (http://discover.nci.nih.gov/lefe/ )
GoMiner: GoMiner helps in batch-processes and organizes lists
of thousands or tens of thousands of genes and provides two
fluent, robust visualizations of the genes in the framework of the
Gene Ontology hierarchy. (http://discover.nci.nih.gov/
gominer/index.jsp / )
High-Throughput GoMiner: High-Throughput GoMiner has the
capabilities of GoMiner and a number of others. It automates the
analysis of multiple microarrays and integrates results across all
of the microarrays, and will be useful in a wide range of
applications, including the study of time-courses, comparison of
multiple gene knock-outs or knock-downs, screening of large
numbers of chemical derivatives generated from a promising
lead compound and evaluation of multiple drug treatments.
(http://discover.nci.nih.gov/gominer/htgm.jsp/ )
References
1. Altschul, S. F. and Gish, W. (1996). Local alignment statistics.
Methods Enzymol, 266:460--480.
2. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J.
(1990). Basic local alignment search tool. J Mol Biol, 215(3):403--
410.
3. McGinnis, S. and Madden, T. L. (2004). BLAST: at the core of a
powerful and diverse set of sequence analysis tools. Nucleic Acids
Res, 32(Web Server issue):W20--W25.
4. Smith, T. F. and Waterman, M. S. (1981). Identification of common
molecular subsequences. J Mol Biol, 147(1):195--197.
5. Wootton, J. C. and Federhen, S. (1993). Statistics of local
complexity in amino acid sequences and sequence databases.
Computers in Chemistry, 17:149--163.
6. David W. Mount, Bioinformatics, Cold Spring Harbor Laboratory
Press
7. Jean-Michel Claverie & Cedric Notredame, Bioinformatics for
Dummies, Wiley Publishing
8. Guex, N. and Peitsch, M.C. (1997). SWISS-MODEL and the Swiss-
PdbViewer: An environment for comparative protein modeling.
Electrophoresis 18, 2714-2723.
Bioinformatics Tools & Its Applications 175
ABOUT AUTHORS
S. Author Name Author Designation and Affiliation
No.
Director,
1 Dr. Jag Mohan Singh Rana Uttarakhand State Biotechnology Department,
Government of Uttarakhand, Dehradun
Associate Professor,
Department of Computer Science & Engineering,
2 Dr. Kunwar Singh Vaisla
BT Kumaon Institute of Technology,
Dwarahat 263653, District Almora, Uttarakhand
Associate Professor,
3 Dr. T. Kalaivani School of Bio Sciences and Technology
VIT University, Vellore 632014 Tamilnadu
Associate Professor,
4 Dr. C. Rajasekaran School of Bio Sciences and Technology
VIT University, Vellore 632014 Tamilnadu
Associate Professor,
5 Dr. I. Arnold Emerson School of Bio Sciences and Technology
VIT University, Vellore 632014 Tamilnadu
Associate Professor,
6 Dr. C. Immanuel Selvaraj School of Bio Sciences and Technology
VIT University, Vellore 632014 Tamilnadu
Sr. Scientist,
7 Dr. A. K. Mishra
Bioinformatics Center, USI, IARI, New Delhi-12
Associate Professor,
8 Dr. Sanjay Kumar Agarwal Department of Stat-Math,
Dolphin PG Institute,Manduwala, Dehradun
Associate Professor,
9 Dr. Kumar Sachin Sri Bhagwan Singh Post Graduate Institute,
Balawala, Dehradun
Professor
10 Dr. Chakresh Kumar Jain Jaypee Institute Information Technology
Noida
Assistant Professor
Department of Computer Science & Engineering
11 Ms. Archana Verma
BT Kumaon Institute of Technology
Dwarahat 263653, District Almora (Uttarakhand)
Assistant Professor
Department of Computer Science & Engineering
12 Mr. Rajendra Bharti
BT Kumaon Institute of Technology
Dwarahat 263653, District Almora (Uttarakhand)
176 Bioinformatics Tools & Its Applications
6 7 8 9 : ; < = > ? 9 @ A ; 9 8 : 8 B
LkR;eso t;rs
mkjk[k.M 'kklu
C D D E F E G H E I J K D E D L M N O D L P H I O Q O R S T L U E F D V L I D
W X Y Z Y [ \ ] ^ _ ` a b Y c Z b c d e c b f Z _ g _ h ^ i Z j k Y _ \ c b f Z _ g _ h ^ l
_ n c ] Z o c Z \ _ ` p \ \ i ] i q f i Z j r
s t u
t t t t t
v w x y z { | w { } ~ } | x { w z w w w w z
t t t
| y } { { { w } ~ w } }
s t u
v y z y y { v { z { | } z y { { y z y y
View publication stats