0% found this document useful (0 votes)

48 views6 pages

Search Sequence Database

The document provides an overview of biological sequence databases, which store DNA, RNA, and protein sequences, and are essential for research in biology. It discusses key characteristics, types of databases like GenBank and UniProt, and bioinformatics tools such as BLAST, FASTA, and HMMER used for sequence searching and analysis. The importance of these tools in identifying sequences, gene annotation, and drug target identification is emphasized, highlighting their role in advancing biological research.

Uploaded by

mughaltabish974

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views6 pages

Search Sequence Database

Uploaded by

mughaltabish974

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Search Sequence Database

Submitted to:
Dr. Samrah
Submitted by:
Group no#2 (M)
BS Zoology 2021-2025

Institute Of Zoology
Bahauddin Zakariya University, Mlt.
Search Sequence Database

Biological Sequence Database

Biological sequence databases are digital libraries that store and organize biological sequences,
such as DNA, RNA, and protein sequences. These databases are crucial resources for researchers
in various fields of biology, enabling them to access, analyze, and compare sequence data.

Key Characteristics:
o Digital Repositories: They exist as computerized systems capable of storing and
managing vast amounts of sequence information.
o Types of Sequences: They primarily hold nucleotide sequences (DNA and RNA) and
amino acid sequences (proteins). Some may also include other polymer sequences.
o Accessibility: Most major biological sequence databases are publicly accessible via the
internet, making them an indispensable tool for the global scientific community.
o earch and Analysis Tools: They typically provide tools and interfaces that allow users to
search for specific sequences, perform sequence alignments, and conduct other
bioinformatic analyses.

Types of Biological Sequence Database

These databases contain original sequence data submitted by researchers. Examples include:

 GenBank (National Center for Biotechnology Information - NCBI, USA) for nucleotide
sequences.
 EMBL-EBI (European Molecular Biology Laboratory - European Bioinformatics
Institute, Europe) for nucleotide sequences.
 DDBJ (DNA Data Bank of Japan, Japan) for nucleotide sequences.
 Protein Data Bank (PDB) for 3D structural data of proteins and nucleic acids.
 UniProtKB/Swiss-Prot (part of UniProt) for high-quality, manually annotated protein
sequences.
 TrEMBL (part of UniProt) for computationally annotated protein sequences.
Tools for Sequence Searching
Bioinformatics tools help researchers find similar sequences in large databases. Key tools
include:
o BLAST
o FASTA
o HMMER

1. BLAST
BLAST, which stands for Basic Local Alignment Search Tool, is a
fundamental and widely used algorithm and program in bioinformatics. Its primary purpose is to
compare a query biological sequence (DNA, RNA, or protein) against a large database of
sequences to identify regions of local similarity.

Core Function
BLAST takes a query sequence and searches a database for sequences that have similar
segments. It doesn't try to find a perfect, end-to-end match of the entire query sequence. Instead,
it focuses on identifying local alignments, which are regions of significant similarity within the
sequences.
How it works

 Query Segmentation: The query sequence is broken down into short "words" of a specific
length (e.g., 3 amino acids for proteins, 11 nucleotides for DNA).
 Database Searching for Word Matches: The algorithm quickly scans the database for
exact or near-exact matches to these query words. These matches are called "seeds" or
"hits."
 Extending the Matches: Once a seed is found, BLAST extends the alignment in both
directions along the query and database sequences. It tries to extend the alignment as long
as the similarity score remains above a certain threshold. Gaps (insertions or deletions)
can be introduced during this extension to improve the alignment score.
 Scoring the Alignments: Each alignment is assigned a score based on the similarity of the
aligned residues (nucleotides or amino acids) and any gaps introduced. Higher scores
indicate greater similarity.
 Statistical Significance: BLAST calculates the statistical significance of each alignment.
This is often expressed as an E-value (Expect value), which represents the number of
alignments with a score equal to or greater than the observed score that are expected to
occur by chance in a database of that size. A low E-value (close to zero) suggests that the
alignment is unlikely to be due to random chance and is therefore more significant.

Types of BLAST
There isn't just one "BLAST." Several variations are designed for different types of comparisons:
 BLASTn: Compares a nucleotide query sequence against a nucleotide database.
 BLASTp: Compares a protein query sequence against a protein database.
 BLASTx: Compares a nucleotide query sequence translated in all six reading frames
against a protein database. This is useful for finding potential protein-coding regions in a
new nucleotide sequence.

Why is BLAST important?

 Identifying Unknown Sequences: Determining the identity or potential function of a

newly sequenced DNA, RNA, or protein by finding similar sequences with known
functions.
 Finding Homologous Sequences: Identifying genes or proteins in different organisms that
share a common evolutionary ancestor. This helps in understanding evolutionary
relationships and conserved functions.
 Gene Annotation: In newly sequenced genomes, BLAST can help locate and identify
genes by comparing genomic sequences to databases of known genes.
 Protein Function Prediction: If a newly discovered protein sequence is similar to a protein
with a known function, BLAST can provide clues about its potential role.
 Drug Target Identification: Comparing pathogen sequences to human sequences can help
identify unique pathogen-specific targets for drug development.

2. FASTA
The FASTA format is a simple, text-based format widely used in
bioinformatics to represent nucleotide or amino acid sequences. It's a standard way to store and
share biological sequence data. The name "FASTA" also refers to a suite of sequence alignment
software that utilizes this format.

Structure of a FASTA File

A FASTA file can contain one or more sequences. Each sequence in the file has two main parts:
i. The Header Line (Definition Line):
 It always begins with a greater-than symbol (>).
 Immediately following the ">" is a sequence identifier (ID). This is a unique name or
code for the sequence.
 After the ID, there can be an optional description or annotation of the sequence. This
information is usually separated from the ID by a space.
 The entire header line is typically kept to a single line of text (ideally less than 80
characters).
ii. The Sequence Lines:
 These lines immediately follow the header line.
 They contain the actual sequence data, using single-letter codes to represent
nucleotides (A, C, G, T, and sometimes U for RNA, or ambiguous bases like N) or
amino acids (using standard single-letter abbreviations).
 The sequence can span multiple lines.
 It's a common convention to break the sequence into lines of a certain length (e.g.,
60-80 characters) for readability, but this is not a strict requirement of the format.
 There should be no extra formatting or spaces within the sequence lines themselves.
Example of a FASTA File (DNA Sequence):
>gi|12345|ref|NC_000001.10| Human chromosome 1, complete sequence
GATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATC
GATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATC
GATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATC

Key Characteristics and Importance:

 Simplicity: The FASTA format is very straightforward and easy to read and parse by both
humans and computer programs.
 Universality: It has become a near-universal standard in bioinformatics. Most sequence
analysis tools, databases, and software packages recognize and use FASTA format.
 * Flexibility: It can represent both nucleotide and amino acid sequences.
 Interoperability: Its plain text nature makes it easy to work with using standard text
editors and scripting languages.
 Input for Bioinformatics Tools: FASTA files are commonly used as input for a wide
range of bioinformatics analyses, including sequence alignment (like BLAST and
FASTA software), phylogenetic analysis, and genome assembly.
 Data Exchange: It's a standard format for exchanging sequence data between researchers
and databases.

iii. HMMER
HMMER is a powerful and widely used software suite in bioinformatics for
sequence analysis using profile Hidden Markov Models (profile HMMs). Developed by Sean
Eddy and his lab, HMMER is designed to find homologous protein or nucleotide sequences and
to perform sequence alignments. It's particularly adept at detecting remote homologs – sequences
that are evolutionarily related but may have low sequence similarity, making them difficult to
identify with simpler methods like BLAST.

How HMMER works?

The general workflow with HMMER involves these steps:

 Building a Profile HMM (hmmbuild): Starting with a well-curated multiple sequence

alignment of a protein family (or a set of related nucleotide sequences), HMMER's
hmmbuild program constructs a profile HMM that statistically describes the family.
 * Searching Databases (hmmsearch, phmmer, hmmscan): Once a profile HMM is built,
HMMER provides tools to search sequence databases for sequences that are likely to be
members of the family represented by the HMM.
 * hmmsearch: Takes a profile HMM as a query and searches it against a database of
individual sequences (protein or nucleotide).
 * phmmer: Takes a single protein sequence as a query and searches it against a database
of protein sequences. It's often faster than hmmsearch for single queries.
 * hmmscan: Takes one or more query sequences and searches them against a database
of profile HMMs (like Pfam). This is useful for identifying which families a given
sequence might belong to.
 * Aligning Sequences to a Profile (hmmalign): HMMER can align individual sequences
or even entire MSAs to an existing profile HMM using the hmmalign program. This
produces structurally informed alignments.
 * Iterative Searching (jackhmmer, PSI-BLAST): For even greater sensitivity in finding
remote homologs, HMMER offers iterative search tools. jackhmmer performs iterative
searches of a sequence database using a query sequence, building a profile HMM from
the hits in each round to improve subsequent searches. PSI-BLAST is a similar tool that
uses position-specific scoring matrices instead of HMMs.
Key Features and Significance of HMMER:

 High Sensitivity: Profile HMMs are excellent at detecting distantly related sequences that
might be missed by other methods.
 Probabilistic Framework: The underlying probabilistic models provide a more robust way
to handle sequence variation.
 Structure-Aware Alignments: Alignments generated with HMMER tend to be more
biologically meaningful as they are based on the conserved patterns captured in the
profile HMM.
 Widely Used: HMMER is the foundation for many important protein family databases
like Pfam and InterPro.
 Versatile: It can be used for both protein and nucleotide sequence analysis (with
specialized tools like nhmmer for DNA homology search).
 Efficient: Modern versions of HMMER (HMMER3) are significantly faster than earlier
versions, making large-scale database searches feasible.

Conclusion
Searching sequence databases is a foundational skill in bioinformatics and life sciences. Tools
like BLAST enable scientists to find similar sequences, annotate genes, and understand genetic
relationships across species. These techniques are crucial in genomics, evolutionary biology,
drug discovery, and diagnostics. With growing data, efficient search methods will remain at the
core of biological research and innovation.

References
https://blast.ncbi.nlm.nih.gov/Blast.cgi
https://www.ncbi.nlm.nih.gov/genbank/
https://www.uniprot.org/

Protein Sequence Alignment with BLAST
No ratings yet
Protein Sequence Alignment with BLAST
9 pages
Using BLAST for Protein Sequence Alignment
No ratings yet
Using BLAST for Protein Sequence Alignment
9 pages
Understanding FASTA and BLAST Formats
No ratings yet
Understanding FASTA and BLAST Formats
2 pages
Blast
100% (1)
Blast
21 pages
BLAST Guide for Biologists
0% (1)
BLAST Guide for Biologists
3 pages
Overview of Bioinformatics Techniques
No ratings yet
Overview of Bioinformatics Techniques
43 pages
Overview of BLAST Tool in Bioinformatics
100% (1)
Overview of BLAST Tool in Bioinformatics
4 pages
Unit Iv - Blast
No ratings yet
Unit Iv - Blast
21 pages
Fasta and Blast
No ratings yet
Fasta and Blast
3 pages
Understanding BLAST for Sequence Analysis
No ratings yet
Understanding BLAST for Sequence Analysis
38 pages
BLAST: Fast Sequence Search Tool
No ratings yet
BLAST: Fast Sequence Search Tool
6 pages
Introduction to BLAST Overview
No ratings yet
Introduction to BLAST Overview
42 pages
Basics of Bioinformatics Overview
100% (8)
Basics of Bioinformatics Overview
99 pages
Aanchal Maurya Bioinformatics 2
No ratings yet
Aanchal Maurya Bioinformatics 2
24 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
9 pages
Bioinformatics Database and Applications
100% (3)
Bioinformatics Database and Applications
82 pages
BLAST: Sequence Alignment Tool Guide
No ratings yet
BLAST: Sequence Alignment Tool Guide
12 pages
BLAST Presentation
No ratings yet
BLAST Presentation
18 pages
Blast: Background: BLAST Is One of The Most Widely Used Bioinformatics Programs
100% (1)
Blast: Background: BLAST Is One of The Most Widely Used Bioinformatics Programs
4 pages
Bioinformatics: Blast and Sequence Analysis
No ratings yet
Bioinformatics: Blast and Sequence Analysis
45 pages
Understanding BLAST in Bioinformatics
No ratings yet
Understanding BLAST in Bioinformatics
17 pages
Bioinformatics Resources Overview
No ratings yet
Bioinformatics Resources Overview
55 pages
Diploma - Practical
No ratings yet
Diploma - Practical
11 pages
Intro to Bioinformatics Lab Guide
No ratings yet
Intro to Bioinformatics Lab Guide
6 pages
Lecture - 02 - Comparative Sequence Analysis
No ratings yet
Lecture - 02 - Comparative Sequence Analysis
28 pages
Bioinformatics Tutorial 2019
No ratings yet
Bioinformatics Tutorial 2019
54 pages
Lesson 4.3 Fast Blast
No ratings yet
Lesson 4.3 Fast Blast
45 pages
Introduction to BLAST in Biology
No ratings yet
Introduction to BLAST in Biology
42 pages
Retrieval of Data
No ratings yet
Retrieval of Data
22 pages
Understanding BLAST Sequence Analysis
No ratings yet
Understanding BLAST Sequence Analysis
15 pages
Structure and Function of Sars-Cov-2 Spike Protein: A Multiple Sequence Alignment (Msa) Study
No ratings yet
Structure and Function of Sars-Cov-2 Spike Protein: A Multiple Sequence Alignment (Msa) Study
11 pages
Sequence Similarity Search with BLAST
No ratings yet
Sequence Similarity Search with BLAST
19 pages
Lecture 05
No ratings yet
Lecture 05
36 pages
Final Blast PDF
No ratings yet
Final Blast PDF
31 pages
Genetic Engineering Software Guide
No ratings yet
Genetic Engineering Software Guide
44 pages
Bioinformatics
No ratings yet
Bioinformatics
11 pages
BTH 403-BTG407 Practical Session1
No ratings yet
BTH 403-BTG407 Practical Session1
12 pages
Bioinformatics Tools for Biologists
No ratings yet
Bioinformatics Tools for Biologists
26 pages
About Basic Local Alignment Search Tool
No ratings yet
About Basic Local Alignment Search Tool
17 pages
Application in Establishing Epidemiology and Variability: Genome & Protein " Sequence Analysis Programs"
100% (3)
Application in Establishing Epidemiology and Variability: Genome & Protein " Sequence Analysis Programs"
23 pages
Bioinformatics Tools Overview
No ratings yet
Bioinformatics Tools Overview
1 page
TY-Exercise 4
No ratings yet
TY-Exercise 4
8 pages
Data Retrieval
67% (3)
Data Retrieval
17 pages
Bioinformatics: Arushi Dinesh Kasi Shruthi
No ratings yet
Bioinformatics: Arushi Dinesh Kasi Shruthi
28 pages
BI205 Prac 5&6
No ratings yet
BI205 Prac 5&6
11 pages
Bioinfo Final Practical
No ratings yet
Bioinfo Final Practical
66 pages
Bioinformatics 3 Vedant
No ratings yet
Bioinformatics 3 Vedant
7 pages
Lab Report 03
No ratings yet
Lab Report 03
18 pages
Understanding BLAST in Bioinformatics
No ratings yet
Understanding BLAST in Bioinformatics
11 pages
Bio Tics
No ratings yet
Bio Tics
7 pages
BLAST Guide for Bioinformatics Students
No ratings yet
BLAST Guide for Bioinformatics Students
36 pages
Bioinformatics
No ratings yet
Bioinformatics
22 pages
Fundamentals of Bioinformatics - L5
No ratings yet
Fundamentals of Bioinformatics - L5
56 pages
Eu Kary Otes
No ratings yet
Eu Kary Otes
10 pages
Addition Mutation
No ratings yet
Addition Mutation
6 pages
Camscanner
No ratings yet
Camscanner
10 pages
2CS Gene Families
No ratings yet
2CS Gene Families
4 pages
Primer Design in Bioinformatics
No ratings yet
Primer Design in Bioinformatics
6 pages
Evolution of Horse
No ratings yet
Evolution of Horse
15 pages
Helminthology and Parasitology
No ratings yet
Helminthology and Parasitology
6 pages
Technology and Livelihood Education: Cookery
No ratings yet
Technology and Livelihood Education: Cookery
13 pages
Andrea Jung CEO AVON
No ratings yet
Andrea Jung CEO AVON
58 pages
PPMP For Research Study
No ratings yet
PPMP For Research Study
1 page
New CBC For Bartending Ncii
No ratings yet
New CBC For Bartending Ncii
67 pages
Riverbank Wilden Catalog For TZ 4 Tz4 Org MTL Eom 02
No ratings yet
Riverbank Wilden Catalog For TZ 4 Tz4 Org MTL Eom 02
24 pages
Exam 3
No ratings yet
Exam 3
8 pages
Masters Thesis Submission - Dhruvi Pandit (AU1920054)
No ratings yet
Masters Thesis Submission - Dhruvi Pandit (AU1920054)
114 pages
Menstrual Cycle Overview and Examples
100% (15)
Menstrual Cycle Overview and Examples
9 pages
The Indian Market Wizards 1st Edition Kirubakaran Rajendran Download
No ratings yet
The Indian Market Wizards 1st Edition Kirubakaran Rajendran Download
69 pages
Diptera: Parasitology Department
No ratings yet
Diptera: Parasitology Department
12 pages
Panachrome Universal Controller Installation Guide V03 GB
No ratings yet
Panachrome Universal Controller Installation Guide V03 GB
3 pages
Class IX SET 1 Mid Term QP 2024-25
No ratings yet
Class IX SET 1 Mid Term QP 2024-25
12 pages
Crop - Wikipedia
No ratings yet
Crop - Wikipedia
9 pages
ccm0027 How To Use zts1240 To Validate Zetasizer Nano - tcm50 63784
No ratings yet
ccm0027 How To Use zts1240 To Validate Zetasizer Nano - tcm50 63784
7 pages
V - Forms - ESH-1 ESHS Certifications and Documents - HWWP-3
No ratings yet
V - Forms - ESH-1 ESHS Certifications and Documents - HWWP-3
4 pages
AASHTO T 329-05 Manual
No ratings yet
AASHTO T 329-05 Manual
6 pages
Lecture 2. Types of Research Design.
No ratings yet
Lecture 2. Types of Research Design.
7 pages
Lse Hik 2013
No ratings yet
Lse Hik 2013
180 pages
Lego's Outsourcing Lessons
No ratings yet
Lego's Outsourcing Lessons
7 pages
Pregel 15 Actual
No ratings yet
Pregel 15 Actual
6 pages
Pharma Merger: Novartis & GSK
No ratings yet
Pharma Merger: Novartis & GSK
10 pages
Motivational Techniques Used For Employees at Amazon: Tamil Nadu National Law University
100% (1)
Motivational Techniques Used For Employees at Amazon: Tamil Nadu National Law University
15 pages
MANIFESTATION: The Science and Art of Creating Your Reality
No ratings yet
MANIFESTATION: The Science and Art of Creating Your Reality
46 pages
SWOT Analysis & Health Programs
No ratings yet
SWOT Analysis & Health Programs
15 pages
Certificate of Employment
No ratings yet
Certificate of Employment
2 pages
Car Insurance Premium Details
No ratings yet
Car Insurance Premium Details
2 pages
JKSSB Je Ee 2015 104 A Series 2d4b74df
100% (1)
JKSSB Je Ee 2015 104 A Series 2d4b74df
17 pages
14 DMR 9300 BaseStations
No ratings yet
14 DMR 9300 BaseStations
18 pages
PR 2 GROUP 4 EXPERIMENTAL RESEARCH PROPOSAL Complete
100% (1)
PR 2 GROUP 4 EXPERIMENTAL RESEARCH PROPOSAL Complete
10 pages
Plastic Waste Management in Landfills
No ratings yet
Plastic Waste Management in Landfills
3 pages

Search Sequence Database

Uploaded by

Search Sequence Database

Uploaded by

Search Sequence Database

Biological Sequence Database

Types of Biological Sequence Database

Why is BLAST important?

 Identifying Unknown Sequences: Determining the identity or potential function of a

Structure of a FASTA File

Key Characteristics and Importance:

How HMMER works?

 Building a Profile HMM (hmmbuild): Starting with a well-curated multiple sequence

You might also like