0% found this document useful (0 votes)

32 views21 pages

Introduction To Databases

The document provides an introduction to bioinformatics, defining it as the use of computational techniques to manage biological data, primarily focusing on sequence analysis. It outlines the aims, applications, and importance of biological databases, detailing various types and examples of primary and secondary databases, such as Gene Bank, EMBL, and Swiss-Port. Additionally, it discusses gene annotation and the significance of accession numbers in identifying sequences within databases.

Uploaded by

Saptak Saha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views21 pages

Introduction To Databases

Uploaded by

Saptak Saha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Introduction to bioinformatics (databases)

Course Code:- BioTech 202

Course Title:- Introductory Bioinformatics
Credit- 3(2+1)
By:
Course Instructor
Ashutosh Ranjan (Guest Faculty)
Department of Basic Sciences and Language, CBS & H
What is bioinformatics ?

In biology, bioinformatics is defined as, “the use of computer

to store, retrieve, analyze or predict the composition or
structure of bio-molecules” . Bioinformatics is the application
of computational techniques and information technology to
the organization and management of biological data. Classical
bioinformatics deals primarily with sequence analysis.
Aims of bioinformatics

❑Development of database containing all biological

information.

❑Development of better tools for data designing, annotation

and mining.

❑Design and development of drugs by using simulation

software.

❑Design and development of software tools for protein

structure prediction function, annotation.

❑Creation and development of software to improve tools for

analyzing sequences for their function and similarity with other
sequences
Applications of bioinformatics

Crop
development
Drought
Resistance Medicine
BioTechnology

Forensic Application of Drug Designing

Analysis
Bioinformatics

Weather
Gene Therapy
Analysis

Veterinary
Science
Biological databases

Biological data are complex, exception-ridden, vast and

incomplete. Therefore several databases has been created
and interpreted to ensure unambiguous results. A collection
of biological data arranged in computer readable form that
enhances the speed of search and retrieval and convenient to
use is called biological database. A good database must have
updated information.
Importance of biological database

A range of information like biological sequences, structures,

binding sites, metabolic interactions, molecular action, functional
relationships, protein families, motifs and homologous can be
retrieved by using biological databases. The main purpose of a
biological database is to store and manage biological data and
information in computer readable forms.
Types of
biological
Database

Primary Secondary Derived

database database Database

Protein Nucleotide Protein Domain and Gene Metabolic

sequence sequence structure Motif Structure expression pathway Specialized
database database
database database database Database database database

Swis-port PIR Gene Pept Gene Bank DDBJ EMBL PDB EBI-MSD MMDB SCOP CATH
Primary database vs. secondary database

❑A primary database contains only sequence or structural

information.

❑The database derived from the analysis or treatment of

primary data are secondary database. It is very important
for interfering protein function.
Examples of some primary
biological database
Gene Bank

❑One of the fastest growing repositories of known nucleotide

sequences, Gene Bank (Genetic Sequence Databank), has a flat
file structure. It is an ASCII text file, readable by both humans and
computers. Besides sequence data, Gene Bank files contain
information such as accession numbers and gene names,
phylogenetic classification.
❑This database has been developed and maintained at the NCBI,
Bethesda, MD, USA, as a part of International Sequence Database
Collaboration (INSDC).
❑It is an open access sequence database. Contd………..
❑It coordinates with individual laboratories and other sequence
databases like EMBL and DDBJ.

❑It is an annotated collection of all nucleotide sequences that are

available to the public.

❑The nucleotide database was divided into three databases at

NCBI: Core Nucleotide database, Expressed Sequence Tag (EST)
and Genome Survey Sequence (GSS).

❑Core Nucleotide database has most of the nucleotide

sequences used. It also encloses all nucleotide records that are
not in the EST and GSS databases.

❑Submission of sequences to GeneBank can be done using

BankIt, Sequin and tbl2asn tools.
EMBL(European Molecular Biology Laboratory)

❑ A comprehensive database of DNA and RNA sequences, EMBL nucleotide

sequence database is collected from scientific literature, patient offices and is
directly submitted by researchers. EMBL has been prepared in collaboration with
GeneBank (USA) and the DNA Database of Japan (DDBJ).

❑ It is established in 1980.

❑ It is maintained by EBI (European Bioinformatics Institute)

Swiss-Port

❑ This is a curated protein sequence database that offers a high level of integration
with other databases and also has a very low level of redundancy. Swiss-Port strives
to provide protein sequences with a high level of annotation (for instance, the
description of protein function, domain structure and post translational
modifications, etc.).
❑ It is established in 1986 and maintained collaboratively , since 1987, by the
department of Medical Biochemistry of the University of Geneva and the EMBL data
Library.
❑ TrEMBL is a computer–annotated supplement of Swiss-Port that contains all
translations of EMBL nucleotide sequence entries, which is not yet integrated in
Swiss-Port.
❑ Currently Swiss-Port have 0.5 and TrEMBL have 7.6 milliom sequences.
Protein Information Resource(PIR)

❑ PIR is an integrated public bioinformatics resource to support genomic and

proteomic research and scientific studies. Nowadays, PIR offers a wide variety of
resources mainly oriented to assisting the propagation and consistency of protein
annotations like PIRSF, ProClass and ProLINK.
Examples of Some Secondary
Biological Database
Motif Databases

❑Protein sequence motif is a set of conserved amino acid residues that are
important for protein function and are located within a certain distance from one
another. These motifs usually provide clues to the functions of otherwise
uncharacterized proteins.
❑ The PROSITE database consists of documentation entries describing protein
domains, families and functional sites as well as associated patterns and profiles
to identify them.
❑ PRINT is a database for protein fingerprints. A fingerprint is a group of
conserved motifs used to characterize a protein family.
Domain Database

❑ A protein domain is an independently folded, structurally compact unit that

forms a steady three- dimensional structure and shows a certain level of
evolutionary conservation. Typically , a conserved domain contains one or more
motifs.
❑ ProDom is a protein domain database automatically generated from the
Swiss-Port and TrEMBL sequence database.
❑ SMART is a highly reliable and sensitive tool for domain identification.
❑ COG is a database and a convenient tool for motif and domain identification.
3D Structure databases

❑PDB (Protein Data bank) is the main primary database for 3D structures of
biological macromolecules determined by X-ray, crystallography and NMR. It also
accepts experimental data used to determine the structures and homology models.
❑ SCOP (Structural Classification of Protein database) classifies protein 3D
structures in a hierarchical scheme of structure classes. All the protein structures in
PDB are classified her, and the updated new structures are deposited in PDB.
❑ The CATH database (Class, Architecture, Topology, Homologous) contains a
hierarchical classification of protein domain structure.
Protein data bank

❑ PDB (Protein data bank) is a repository for 3D structural data obtained by x-ray
crystallography or NMR spectroscopy of proteins and nucleic acids.
❑ Research Collaboratory for Structural Bioinformatics (RCSB) PDB provides a
variety of tools and resources for studying the structures of biological
macromolecules and their relationship with other sequences, its function and
diseases caused if any .
Annotation of Gene ?????

❑ In molecular biology, genomes make the basic genetic material and typically
consist of DNA. Whereby, genome include the genes (coding ) and non coding
regions, of interest to us, are the coding regions as they actively influence basic life
processes. The genes contain useful biological information that is required in
building up and maintaining an organism. Gene annotation can be defined merely
as the process of making nucleotide sequence meaningful.
❑ Gene annotation involves the process of taking the raw DNA sequence produced
by genome sequencing projects and adding layers of analysis and interpretation
necessary to extracting biologically significant information and placing such derived
details into context. Annotation is the process by which pertinent information about
these raw DNA sequences is added to the databases.
Accession number

Accession numbers are unique identifiers which permanently identify sequences in

the database.

Bioinformatics for Plant Scientists
No ratings yet
Bioinformatics for Plant Scientists
28 pages
Bioinformatics Lecture Notes Database
No ratings yet
Bioinformatics Lecture Notes Database
28 pages
Bioinformatics Biological Database
No ratings yet
Bioinformatics Biological Database
31 pages
Bioinformatics
No ratings yet
Bioinformatics
47 pages
Databases - Final
No ratings yet
Databases - Final
50 pages
Biological Databases
No ratings yet
Biological Databases
19 pages
Databases Class Work
No ratings yet
Databases Class Work
48 pages
BCH 505 Bioinformatics 3 (2 2) Databases
No ratings yet
BCH 505 Bioinformatics 3 (2 2) Databases
17 pages
Biological - Databases Class Work 60
No ratings yet
Biological - Databases Class Work 60
60 pages
Overview of Sequence Databases
No ratings yet
Overview of Sequence Databases
135 pages
Biological Data Bases
No ratings yet
Biological Data Bases
36 pages
Peace BMCB Seminar
No ratings yet
Peace BMCB Seminar
13 pages
Database 2
No ratings yet
Database 2
15 pages
Unit II Bioinformatics
No ratings yet
Unit II Bioinformatics
25 pages
Biological Databases: - Bio-Informatics
No ratings yet
Biological Databases: - Bio-Informatics
16 pages
Biological Databases ODL
No ratings yet
Biological Databases ODL
31 pages
Biological Information On Artificial Intelligence
No ratings yet
Biological Information On Artificial Intelligence
20 pages
CMSC 838T - Lecture 9: Bioinformatics Databases
No ratings yet
CMSC 838T - Lecture 9: Bioinformatics Databases
65 pages
Basics of Bioinformatics in Biological Research
No ratings yet
Basics of Bioinformatics in Biological Research
5 pages
Tics - A Brief Introduction
No ratings yet
Tics - A Brief Introduction
4 pages
Biologicaldatabase 190402034501
No ratings yet
Biologicaldatabase 190402034501
26 pages
Sec1 Introduction To Bioinformatics
No ratings yet
Sec1 Introduction To Bioinformatics
20 pages
CH12
No ratings yet
CH12
8 pages
Bioinformatics Databases Explained
No ratings yet
Bioinformatics Databases Explained
5 pages
Bioinformatics PPT Section B Data Storage and Retrival Group 3
No ratings yet
Bioinformatics PPT Section B Data Storage and Retrival Group 3
36 pages
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
No ratings yet
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
48 pages
Biological Databases
No ratings yet
Biological Databases
6 pages
Biotech Database Classifications Overview
No ratings yet
Biotech Database Classifications Overview
16 pages
Capture D'écran . 2023-03-14 À 00.15.22
No ratings yet
Capture D'écran . 2023-03-14 À 00.15.22
54 pages
Bioinformatics for Researchers
No ratings yet
Bioinformatics for Researchers
23 pages
Unit Ii
No ratings yet
Unit Ii
23 pages
Bioinformatics Database Basics
No ratings yet
Bioinformatics Database Basics
18 pages
Presentation 11
No ratings yet
Presentation 11
20 pages
Unit 2
No ratings yet
Unit 2
36 pages
Overview of Bioinformatics Databases
50% (2)
Overview of Bioinformatics Databases
5 pages
Biological Database ODL
No ratings yet
Biological Database ODL
21 pages
Bioinformatics Overview for Students
No ratings yet
Bioinformatics Overview for Students
32 pages
Biological Databases
No ratings yet
Biological Databases
17 pages
Module 2 Biodata
No ratings yet
Module 2 Biodata
36 pages
Databases 2025
No ratings yet
Databases 2025
50 pages
Biological Databases
No ratings yet
Biological Databases
3 pages
Bioinformatics (Final)
No ratings yet
Bioinformatics (Final)
41 pages
Biological Databases
No ratings yet
Biological Databases
13 pages
Biological Databases PDF
No ratings yet
Biological Databases PDF
13 pages
Protein Database
No ratings yet
Protein Database
3 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
52 pages
Biological Data and Database
No ratings yet
Biological Data and Database
13 pages
L-5 Protein Database and Secondary Databases
No ratings yet
L-5 Protein Database and Secondary Databases
24 pages
Biological Databases - Databanks
No ratings yet
Biological Databases - Databanks
7 pages
Bioinformatics Database Guide
No ratings yet
Bioinformatics Database Guide
19 pages
Sequence and Structure Retrieval
No ratings yet
Sequence and Structure Retrieval
9 pages
8024 Bio Info
No ratings yet
8024 Bio Info
28 pages
BCH 516-1
No ratings yet
BCH 516-1
32 pages
Bioinformatics for Researchers
No ratings yet
Bioinformatics for Researchers
105 pages
Bioinformatics Databases
No ratings yet
Bioinformatics Databases
10 pages
? Bioinformatics Study Note
No ratings yet
? Bioinformatics Study Note
4 pages
M Lec 01 & 02 Biological Database
No ratings yet
M Lec 01 & 02 Biological Database
50 pages
Bioinformatics Basics in Research
No ratings yet
Bioinformatics Basics in Research
5 pages
Biological Data and Database Biological Data
No ratings yet
Biological Data and Database Biological Data
10 pages
Selective Breeding PP Questions
No ratings yet
Selective Breeding PP Questions
21 pages
Stress-Induced Tumor Growth Mechanisms
No ratings yet
Stress-Induced Tumor Growth Mechanisms
6 pages
L1 Expression Vectors
No ratings yet
L1 Expression Vectors
14 pages
GST 211 Ques and Ans by Kay Boss-1
No ratings yet
GST 211 Ques and Ans by Kay Boss-1
28 pages
3 Monohybrid and Dihybrid Punnett Squares PPT 0
No ratings yet
3 Monohybrid and Dihybrid Punnett Squares PPT 0
3 pages
Reviews: Genomic Alterations in MM
No ratings yet
Reviews: Genomic Alterations in MM
14 pages
Indian National Biology Olympiad 2010
100% (1)
Indian National Biology Olympiad 2010
45 pages
Cytoplasmic Inheritance
No ratings yet
Cytoplasmic Inheritance
26 pages
Snc4mexam Review
No ratings yet
Snc4mexam Review
19 pages
Lab 12 Human Genetics
No ratings yet
Lab 12 Human Genetics
25 pages
Sts Genome Editing
No ratings yet
Sts Genome Editing
6 pages
5-Principles of Inheritance and Variation True False
No ratings yet
5-Principles of Inheritance and Variation True False
7 pages
Richards Et Al.2015-Standards and Guidelines For The Interpretation of Sequence
No ratings yet
Richards Et Al.2015-Standards and Guidelines For The Interpretation of Sequence
20 pages
Counseling in The Era of Personalized Medicine
No ratings yet
Counseling in The Era of Personalized Medicine
20 pages
BIO 362 Exam 2 Study Guide
100% (1)
BIO 362 Exam 2 Study Guide
62 pages
Agricultural Biotechnology
No ratings yet
Agricultural Biotechnology
2 pages
Genetics and Heredity Guide
No ratings yet
Genetics and Heredity Guide
12 pages
GCSE Biology Revision Guide 2006-2007
67% (3)
GCSE Biology Revision Guide 2006-2007
113 pages
Genetics: Incomplete & Codominance
No ratings yet
Genetics: Incomplete & Codominance
3 pages
04 Coding and Non Coding DNA
No ratings yet
04 Coding and Non Coding DNA
10 pages
Plant Diversity Assessment Techniques
No ratings yet
Plant Diversity Assessment Techniques
134 pages
Science 8: Mendelian Genetics Lesson
100% (1)
Science 8: Mendelian Genetics Lesson
2 pages
E-Prac CAT # 02 - Test
100% (1)
E-Prac CAT # 02 - Test
30 pages
Cell Division
100% (1)
Cell Division
49 pages
ANSWERS Gene Expression and Regulation
No ratings yet
ANSWERS Gene Expression and Regulation
4 pages
Ethical Issues in In Vitro Fertilization
No ratings yet
Ethical Issues in In Vitro Fertilization
3 pages
Biological Functions of Each of The Following Peptides: (2 PTS)
No ratings yet
Biological Functions of Each of The Following Peptides: (2 PTS)
3 pages
Genetics Practice Multiple Choice Questions
100% (1)
Genetics Practice Multiple Choice Questions
5 pages
II PUC Biology Important Questions 2019 To 2015
No ratings yet
II PUC Biology Important Questions 2019 To 2015
20 pages
Class 10 Bio CH 2
No ratings yet
Class 10 Bio CH 2
26 pages

Introduction To Databases

Uploaded by

Introduction To Databases

Uploaded by

Introduction to bioinformatics (databases)

Course Code:- BioTech 202

In biology, bioinformatics is defined as, “the use of computer

❑Development of database containing all biological

❑Development of better tools for data designing, annotation

❑Design and development of drugs by using simulation

❑Design and development of software tools for protein

❑Creation and development of software to improve tools for

Forensic Application of Drug Designing

Biological data are complex, exception-ridden, vast and

A range of information like biological sequences, structures,

Primary Secondary Derived

Protein Nucleotide Protein Domain and Gene Metabolic

❑A primary database contains only sequence or structural

❑The database derived from the analysis or treatment of

❑One of the fastest growing repositories of known nucleotide

❑It is an annotated collection of all nucleotide sequences that are

❑The nucleotide database was divided into three databases at

❑Core Nucleotide database has most of the nucleotide

❑Submission of sequences to GeneBank can be done using

❑ A comprehensive database of DNA and RNA sequences, EMBL nucleotide

❑ It is maintained by EBI (European Bioinformatics Institute)

❑ PIR is an integrated public bioinformatics resource to support genomic and

❑ A protein domain is an independently folded, structurally compact unit that

Accession numbers are unique identifiers which permanently identify sequences in

You might also like