Prof. Dr. Md.
Ariful lslam
Bioinformatics Dept of Microbiology
-I
Bioinformatics
-I
• Introduction and Biological Databases (I): Bioinformatics- Definition, goal, history
and scope, major areas application and limitations, major databases, types of
databases, pitfalls of biological databases, information retrieval from
biological databases, nucleotide database searching, retrieval of specific gene from
database, protein database searching, global bioinformatics centers and servers.
• Sequence Alignment (II): Pairwise sequence alignment, sequence similarity versus
sequence identity, alignment methods, scoring matrices, statistical significance
of sequence alignment, database similarity searching, heuristic database
searching, Basic Local Alignment Search Tool (BLAST), Multiple Sequence
Alignment, ClustalW/ClustalX, protein motifs and domain prediction, identification of
motifs and domains in multiple sequences.
• Structural bioinformatics (V): Protein structure basics, amino acids, peptide
formation, dihedral angles, hierarchy, secondary structures, tertiary
structures, determination of protein three-dimensional structure, protein
structure database, protein structure visualization, comparison and
classification, protein secondary structure prediction, protein tertiary structure
prediction, RNA structure prediction, types of RNA structures, RNA secondary
structure prediction methods.
Introduction
and
Biological
Databases
What is Bioinformatics?
Bioinformatics is an interdisciplinary research area at the interface between
computer science and biological science.
Goals
Provide new insights into
Better understanding of a
molecular sequences and
living cell and how it functions
structural data and provide a
at the molecular level.
“global” perspective of the cell.
Living Fossils: DNA Records the
History of Life and due to Solving functional problems
central dogma - Better using sequence and sometimes
understanding of function of a structural approaches.
cell by analysing sequence data.
Scopes
Bioinformatics consists of two
subfields:
•The of
development and
computational
databases and
tools
• the application of these tools
and databases in
generating biological
knowledge to better
understand living systems.
Bioinformatics has not only
become essential for basic
genomic and molecular biology
research but is having a major
impact on many areas of
biotechnology and biomedical
sciences.
Limitations
• Bioinformatics and experimental biology are independent, but
complementary, activities. Bioinformatics results need to
be consistent with experimental biology
• Quality of bioinformatics predictions depends on quality of the data
and sophistication of the algorithms
• Completely relying on the information is dangerous if the info is
inaccurate
• You must need to know biology behind it!
• IT IS A GOOD PRACTICE TO USE MULTIPLE PROGRAMS, IF THEY
ARE AVAILABLE, AND PERFORM MULTIPLE EVALUATIONS.
New Themes
• There is no doubt that bioinformatics is a field that holds great potential for
revolutionizing biological research in the coming decades.
• In addition to providing more reliable and more rigorous computational tools for
sequence, structural and functional analysis.
• THE MAJOR CHALLENGE FOR FUTURE BIOINFORMATICS DEVELOPMENT IS TO
DEVELOP TOOLS FOR ELUCIDATION OF THE FUNCTIONS AND INTERACTIONS
OF ALL GENE PRODUCTS IN A CELL.
• System Biology!
Databases
• Databases are fundamental to modern biological
research, especially to genomic studies.
• The goal of a biological database is two fold:
• information retrieval and
• knowledge discovery.
Electronic Databases
• Electronic databases can be constructed either as
• flat files,
• relational, or
• object oriented.
• Flat files are simple text files and lack any form of organization to
facilitate information retrieval by computers.
• Relational databases organize data as tables and
search information among tables with shared features.
• Object-oriented databases organize data as objects and associate
the objects according to hierarchical relationships.
Biological databases
Biological databases encompass all three types (flat files, relational, or object
oriented)
• Based on their content, biological databases are divided into
• primary,
• secondary, and
• specialized databases.
• Primary databases simply archive sequence or structure information;
• Secondary databases include further analysis on the sequences
or structures.
• Specialized databases cater to a particular research interest.
Properties of Biological databases
• Biological databases need to be interconnected so that entries in
one database can be cross-linked to related entries in another
database.
• NCBI databases accessible through Entrez are among the most
integrated databases.
• Effective information retrieval involves the use of Boolean operators.
• Entrez has additional user-friendly features to help conduct complex
searches. One such option is to use Limits, Preview/Index, and History
to narrow down the search space.
• Alternatively, one can use NCBI-specific field qualifiers to conduct searches.
• To retrieve sequence information from NCBI GenBank, an understanding of
the format of GenBank sequence files is necessary.
Limitation
• It is also important to bear in mind that sequence data in these
databases are less than perfect. There are sequence and
annotation errors.
• Biological databases are also plagued by redundancy problems.
• There are various solutions to correct annotation and reduce
redundancy, for example, merging redundant sequences into
a single entry or store highly redundant sequences into a
separate database.
Biological Databases Available Via the WWW
1 Meta databases 5
2 Model organism databases Signal transduction pathway database
3 Nucleic acid databases s
1.DNA 6
databases 3.2 Metabolic pathway and protein functio
Gene expression databases (mostly n databases
microarray data) 7 Additional databases
3. Phenotype databases 1. Exosomal databases
4. RNA databases 7.2
4 Amino acid / protein databases Mathematical model databases
1. Protein sequence databases 3. Taxonomic databases
2. Protein structure databases 4. Radiologic databases
3.Protein model 7.5
databases 4.4 Antimicrobial resistance databas
Protein-protein and other molecular es
interactions 8 Specialized databases
4.5Protein expression databases
Information Retrieval from
Biological Databases
Let’s go