0% found this document useful (0 votes)
49 views5 pages

Fasta& Blasta

FASTA& BLASTA

Uploaded by

Bhavana Manimala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views5 pages

Fasta& Blasta

FASTA& BLASTA

Uploaded by

Bhavana Manimala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

FASTA- Fast Alignment Search Tool- All

FASTA 0s a DNA and protein sequence alignment


software packagefirst described (as FASIP) Dy
David J. Lipman and William R. Pearson In 1985. The
original FASTP program was designed Tor
protein sequence similarity searching. FASTA added the ability to do
DNA:DNA searches,
translated protein : DNA searches, ordered or unordered peptide searches and also provided a
more sophisticated shuffling program for evaluating statistical significance. There are several
programs in thispackage that allow the alignment of protein sequences and DNA sequences.
FASTA is pronounced "fast A", and stands for "FAST-All".
FASTA is a "word" based method, It looks for matching "word" or sequence patterns called as "k
tuples". It then builds a local alignment based on these word matches. It matches identical words
from each list andthen creates diagonals by joining adjacent matches.
The scoring is done by using PAM/BLOSUM matrices.
4 stages to algorithm
1) Finding initial regions in search sequence
2) Re-score to find top 10 initial regions (init1)
3) Attempt to join initial regions together (initn)
4) Optimize around initial region to find best fit (opt)

Using a look-up table (generally implemented as a fast hash table, "#*) locate all identities
between 2 DNAor amino acid sequences

Example of hash table use for a protein sequence

sequence Position
number
1 2 3 4 5 6 89 10
1 W R W T WT
2 W K W T LR R
SEQ - location
F-1
L-2
W-3, 6, 9
R- 4
T-5, 8
S-7
FASTA locates al
table, "#)
(generally called as a fast hash
Using the above look-up table sequences and generates aKtup value
aminoacid
identities between 2 DNA or
region
scoring
are re-scored to find top 10 initial regions and the best
Then these Ktup values are any overlapping and non overlapping
checks to see if there
amongst them as "init1". Later it using a
to rank the library sequences.. Finally
regions to create a initn (new) score and is used possible alignment.
Needleman-Wunsch algorithm it compares these scores andgives the best
FASTA and DLAST
The number of DNA and protcin scquences in public databases is very large.
the dalabase.
Searching a database involvcs nligning the query scqucncc to cach scquence in
to find slgnificant local alignment.
programs that identify homologous DNA
BLAST and FASTA arc (wo similarity searclhing
sequence similarity.
soqucnces and protcins bascd on the excess DNA
provide lacilitics for comparing DNA and proteins scquences with the cxisting
They
and protein databases.
for performing databasc searches.
They are two major heuristic algorithms

BLAST
Working of FASTA and BLLAST and
tools used in bioinformatics. Both
FASTA and BLAST are the software
pairwise sequence alignment.
FASTA use a heuristic word méthod for fast
of identical or nearly identical
lettersintwo sequences.
It works by finding short stretches
called words.
These short strings of characters are common.
assumption is that two related sequences must have at least one word in
The basic
alignment can be obtained by extending
" By first identifying wordmatches, a longer
similarity regions from the words.
high-scoring regions can be
Once regions of high sequence similarity are found, adjacent
joined into a full alignment.
The main difference between BLAST and FASTA is that BLAST is mostly involved in findinu
ofungapped, locally optimal sequence alignments whereas FASTA isinvolved in tinding
similarities between less similar sequences.
BLAST (Basic Local Alignment Search Tool)
The BLAST program was developed by Stephen Altschul of NCB1 in 1990 and has since
become one of the most popular programs for sequencc analysis.
BLAST uSses heuristics to align a query sequence with all
sequences in adatabase.
The objective is to find high-scoring
ungapped
segments among related sequences. The
existence ofsuch segments above agiven threshold
random chance, which helps to indicates pairwise similarity beyond
database.
discriminate related sequences from unrelated sequences in a

O Scanned wh ONÉN Saone


amount of space available to scarch. deereasiny
score limits the
Note that increasing the T up the proceSs Of
neighborhood words, while at the samne time spceding
the number of
BLAST

Varinnts of BLAST
ucleotide scqucnces
BLAST-N: compares nucleotidc sequence with
BLAST-P: compares protcin scqucnces with protein scqucnces
BLAST-X:Compares nuclcotide scquences against the protein scquenccs
translations of nucleotide
(BLAST-N:comparcs the protcin sequences against the six frame
sequences
(BLAST-X: Comparcs the six framc translations of nucleotide sequence against the sis
frame translations of protcin sequenccs.

FASTA

FASTA stands for fast-all" or "FastA".


It was the first database similarily search tool developed, preceding the development of
BLAST.

FASTA is another sequence alignment tool which is used to search similarities between
sequences of DNA and proteins.
Pad
Eram
FASTA uses a "hashing" strategy to find matches for a short stretch of identical residues
with alength of k. The string of residues is known as ktuples or ktups,which are equivalent
towords in BLAST, but are normally shorter than the words.
Typically, a ktup is composed oftworesidues for protein sequences and six residues for
DNA sequences.
The query sequence is thus broken down into sequence patterns or words known as
k-tuples
and the target sequences are searched for these k-tuples in order to find the
similarities
between the two.
FASTA isa fine tool for similarity searches.
These methods are not guaranteed to find the
optimal alignment or true homologs. but are 50
J00 times faster than dynamic
programming.

O)kanned with OKEN Scanner


ability toldentify eions ol locat
BLAST is popular as a biotnfarmatics tool due to its
qutckly. BLAST calculates an expectation alue. wnic
similarity between two sequences
of matchcs betwccn twosequcnccs, It uses the localalig . ent of
cstimatesthenumber
sequences.
sequences, by locating short mat es
Using a heuristiç method, BLAST inds similar ing.
process of finding similar sequences is called se
bclween the twosequences. This alignments. While ater. pl glo
BLAST begins to make local
IIis after this first match that known as words, are very
impoi'ant.
common letters,
findsimilarityinsequences, sets of following stretch ofletters,
[Link]
For example, suppose that the sequence contains the be 3 lctter.:. ln
under normal conditions, the vord size would
conducted
a BLAST was being words would be GLK, LKIF, KFA,
stretch of letters,the searched
Lnis case, using the given the
of BLAST locates all common three-letter words between
The heuristic algorithm
sequence or sequences from the database. This result will
sequence of interest and the hit
then be used to build an alignment.
the rest of the words are also assembled.
After making words for the sequence of interest,
the thrcshold 7. when
These words must satisfy a requirement of having a score of at least
compared by using a scoring matrix.
One commonly used scoring matrix for BLAST searches is BLOSUM62, although the
optimal scoring matrix depends on sequence similarity.
Once both words and neighborhood words are assembled and compiled. they are compared
to the sequences in the database inorder to find matches. The threshold score Tdetermines
whether or not a particular word willbe included in the alignment.
Once seeding has been conducted, the alignment which is only 3 residues long, is extended
in both directions by the algorithm used by BLAST.
Each extension impacts the score of thealignment by either
increasing or decreasing i. If
this score is higher than a pre-determined T, the
alignment will be included in the results
given by BLAST. However, if this score is lower than this
willcease to extend, preventing the areas of poor
pre-determined T, the alignment
alignment from being included in the
BLAST results.

You might also like