0% found this document useful (0 votes)

81 views32 pages

Sequence Alignment and Homology Analysis

The document discusses sequence alignment and homology. It defines a biological sequence as an ordered collection of symbols in an alphabet, and notes that sequence analysis can provide information about a sequence's function and relationships to other molecules. Sequence analysis methods include pairwise alignment, multiple alignment, motif searches, phylogeny, and homology searches using tools like BLAST to compare a query sequence to database sequences and evaluate hits.

Uploaded by

monkey_isaac

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views32 pages

Sequence Alignment and Homology Analysis

Uploaded by

monkey_isaac

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Sequences alignments (similarity & homology)

What is a sequence?
!B " ioinformatic : a biological sequence is a simple word ! "A word is a ordonned collection of symboles in an aplhabet ! "The primary structure is only taking into account

Bremen

P. Thbault- Alignment sequence

The sequence is represented through a given format

P. Thbault- Alignment sequence

Sequence analysis, what for?

" A sequence contains information about : - function, - relationships between other molecules " A sequence reflects some physico-chimic constraints due to: " The environment (water, lipid) " the molecular evolution The objectif is to predict important informations about the macromolecular function thanks to the only sequence
P. Thbault- Alignment sequence 4

Sequence analysis, what for?

Multiple Alignment Motifs search

Phylogeny

Databases Sequence Homologous sequences search Pairwise alignment

P. Thbault- Alignment sequence

Genome Annotation

Text search
" Search terms/criteria within sequences annotations: function, keywords, organisms, features " Generic sites :
" Entrez : NCBI server (http:// www.ncbi.nlm.nih.gov/Entrez/) " SRS : available at different sites

" Specialized sites

" SGD (all about yeast : http://genome-www.stanford.edu/ Saccharomyces/)
P. Thbault- Alignment sequence 6

Homology search
" Goal: search for sequences similarities in order to infer structural or functional information " Method : sequences alignment or dot matrix " Definitions :
" the similarity to measure the similarity " The homology : " is an hypothesis based on sequence similarity " stipulates that 2 sequences are derived from a common ancestor
P. Thbault- Alignment sequence 7

Evolution of sequences
" The concept of homology relates to the mechanisms of molecular evolution " Principles :

" homologous sequences are derived from a common ancestor which sequence is not available (unfortunately!) " at the molecular level, the events of evolution are substitutions, insertions and deletions " there exists a selection pressure at the structural or functional levels on either genes or their products : this pressure guides sequence evolution
P. Thbault- Alignment sequence 8

Information inference and evolution

" Most bioinformatics methods rely on information transfer from known sequences towards new sequences : inference reasoning " This inference relies on evolution events such as: " speciation (an ancestor specie => different species) " Genes duplication " merge / split of genes (leads to the domains composition of genes and proteins)

P. Thbault- Alignment sequence

Common ancestor

spciation time P1 P2 duplication P1

Orthologous

P2a

P2b

Specie 1

Specie 2

Paralogous 10

functional inference
Common ancestor P Database

fonction F (deduced from homology)

What is the Function of P1?

spciation

function F (experimental work)

time

P1
specie 1

homology
Software to compare sequences

P2
specie2
11

Ex. 1: trypsin, human & chiken (~80 % id.)

TRY3_CHICK MKFLFLILSCLGAAVAFPGGADDDKIVGGYTCPEHSVPYQVSLNSGYHFCGGSLINSQWV TRY3_HUMAN MN-PFLILAFVGAAVAVPFDDDDKIVGGYTCEENSLPYQVSLNSGSHFCGGSLISEQWV *: ****: :*****.* *********** *:*:********* ********..*** TRY3_CHICK LSAAHCYKSRIQVRLGEYNIDVQEDSEVVRSSSVIIRHPKYSSITLNNDIMLIKLASAVE TRY3_HUMAN VSAAHCYKTRIQVRLGEHNIKVLEGNEQFINAAKIIRHPKYNRDTLDNDIMLIKLSSPAV :*******:********:**.* *..* . .:: *******. **:********:*.. TRY3_CHICK YSADIQPIALPSSCAKAGTECLISGWGNTLSNGYNYPELLQCLNAPILSDQECQEAYPGD TRY3_HUMAN INARVSTISLPTAPPAAGTECLISGWGNTLSFGADYPDELKCLDAPVLREAECKASCPGK .* :..*:**:: . *************** * :**: *:**:**:* : **: : **. TRY3_CHICK ITSNMICVGFLEGGKDSCQGDSGGPVVCNGELQGIVSWGIGCALKGYPGVYTKVCNYVDW TRY3_HUMAN ITNSMFCVGFLEGGKDSWKRDSGGPVVCNGQLQGVVSWGHGCAWKNRPGVYTKVYNYVDW P. Thbault-: Alignment sequence **..*:*********** **********:***:**** *** *. ******* *****

Ex. 2: trypsin, human & mosquito (~30 % id.)

TRY3_ANOGA MISNKIAILLAVLVVAVACAQARVALKHRSVQALPRFLPRPQYDVGHRIVGGFEIDVSET TRY3_HUMAN --------MNPFLILAFVGAA--V--------AVP------FDDDDKIVGGYTCEENSL : ..*::*.. * * *:* :* ..:****: TRY3_ANOGA PYQVSLQYFNSHRCGGSVLNSKWILTAAHCTVNLQPSSLAVRLGSS--RHASGGTVVRV TRY3_HUMAN PYQVSLN-SGSHFCGGSLISEQWVVSAAHC---YKTRIQVRLGEHNIKVLEGNEQFINA ******: .** ****::..:*:::**** : : ****. : .*. .:.. TRY3_ANOGA ARVLEHPNYDDSTIDYDFSLMELETELTFSDVVQPVSLPEQDEAVEDGTMTTVSGWGNTQ TRY3_HUMAN AKIIRHPKYNRDTLDNDIMLIKLSSPAVINARVSTISLPTAPPAA-GTECLISGWGNTL *:::.**:*: .*:* *: *::*.: .:. *..:*** *. **

: ..

:******

TRY3_ANOGA SAAESNAILRAANIPTVNQKECTIAYSSSGGITDRMLCAGYKRGGKDACQGDSGGPLVV TRY3_HUMAN SFGADYPDELKCLDAPVLREAECKA-SCPGKITNSMFCVGFLEGGKDSWKRDSGGPVVC * .*: *:. : *.:.:Alignment **. *..* **: *:*.*: .****: : *****:* P. Thbaultsequence TRY3_ANOGA DGKLVGVVSWGFGCAMPGYPGVYARVAVVRNWVRENSGA--

the trypsin case

" Very conserved sequence " Strong structural constraints : 3 cysteines bonds (cys-cys) " Sequence similarity is in accordance with phylogenetic distances of species " Function identity is proved experimentally

P. Thbault- Alignment sequence

Difficulties
" With time, mutations accumulate until similarity between sequences disappear : homology is not detectable = false negatives " There are mechanisms, independent from evolution, which result in artifact similarities (low complexity regions) similarity but no homology = false positive

P. Thbault- Alignment sequence

Modular composition of proteins

" Many proteins appear as domain combinations " Domains can be repeated and present in different protein in various orders " Similarity (and homology!) between proteins can thus be partial : this makes the alignment more complicated and affect functional inference (a common domain might not be enough to result in a common function)
P. Thbault- Alignment sequence 16

Example of modular proteins

F12 PLAT F2 F1 E E F1 K E K K catalytic catalytic

F12 & PLAT are 2 proteins involved in blood coagulation (the catalytic domain has a serine protease activity). Domains frequently correspond to exons.

P. Thbault- Alignment sequence

How to compare 2 sequences?

" Based on a graphical view -> Dot matrix approach " Based on a sequence view ->Alignment approach

Bremen

P. Thbault- Alignment sequence

!dot matrix! view

P. Thbault- Alignment sequence

Protein 2

Dot - Matrix Protein 1

P. Thbault- Alignment sequence 20

Alignment of 2 sequences
" Pb : A huge number of possibilities " Which one?
A C - T T A G G C A - G T - G G C * * * * * A C T T A G G C - A G T G G C * * * * A C T T A G G C A G T - G G C * * * * *

Alignment of 2 sequences
" Evaluation " Similarity criteria
4 matchs 2 mismatchs 2 gaps A C T T A G G C - A G T G G C * * * * 5 matchs 0 mismatch 4 gaps A C - T T A G G C A - G T - G G C * * * * * 5 matchs 1 mismatch 2 gaps

A C T T A G G C A G T - G G C * * * * *

Score Systems
" Alignement Score = " scores at each position " Different events: match = +2 " Indel / substitution / identit mismatch = -1 " All substitutions are not equivalents gap = -2 " ADN : transitions / transversions
Proteins : " physico-chimics properties " Models for evoultion " Penality for gaps : " Linear, log

Matrix of substitution Opening Extending

Matrix of substitutions (aa)

BLOSUM62

Matrix of substitutions (aa)

matchs : always > 0, but different scores BLOSUM62 mismatchs :
<0 : penality =0 : neutral >0 : neatrly like a match

Algorithms
" How to find the best alignement? " Exacts Algorithms : " Programmation dynamique (Needleman & Wunsch, Smith &
Waterman) " Take time if databases

" Heuristiques = not sure about the optimal solution " Blast, Fasta

Global orlocal ?
" 2 types of alignment :
Needleman & Wunsch Fasta

global : total length

local : by pieces

Smith & Waterman Blast

Comparing a sequence with those of a databases

The goal is to compare a query sequence all the subject sequences of the database
sequence database

For each sequence of the database, the program tries to find the best alignment
P. Thbault- Alignment sequence 28

Blast Hit evaluation

" Satistic evaluation: random ? " E-value

" " "

S : !bit-score! of the alignement K, ! : parameters (score system, sequence composition) m, n : lentgh of sequences (or size of the database)

E = K.m.n.e-!S

E = nb of alignements that we may get in the database with a score more that a score under the random hypothesis

Blast Tools

Blast utilisation
" Questions : " Which database ?
General (GenBank, UniProt) Specialized (EST, limited to one organism, family of proteins, etc.) " Nucleic or proteic sequences? " Are the default parameters adapted? " Interpretation of the results: " Which E-value max ? No simple rule : E < 1e-10 => clear homology 1e-10 < E < 1e10-5 => may be ??? 1e10-5 < E => not significant enough

" "

But also : size, %id, %gap = to examin alignements

Blast programs
Program blastp blastn blastx tblastn tblastx database proteins proteins nucleotides query seq. comment proteins nucleotides proteins
Translation of the query seq Translation of the database. Translation of the query seq and the database

nucleotides nucleotides

P. Thbault- Alignment sequence

Genomics and Similarity Search
No ratings yet
Genomics and Similarity Search
43 pages
Pairwise Alignment Prelab PDF
No ratings yet
Pairwise Alignment Prelab PDF
87 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Bioinformatics 2
No ratings yet
Bioinformatics 2
26 pages
Unit 3
No ratings yet
Unit 3
44 pages
Lecture 6 - Sequence Analysis
No ratings yet
Lecture 6 - Sequence Analysis
28 pages
Sequence Analysis - Alignment
No ratings yet
Sequence Analysis - Alignment
57 pages
Unit3 Final
No ratings yet
Unit3 Final
114 pages
Unit 3 Sequence Alignment and Phylogenetic Tree
No ratings yet
Unit 3 Sequence Alignment and Phylogenetic Tree
70 pages
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
No ratings yet
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
107 pages
Bio 3
No ratings yet
Bio 3
51 pages
Bioinformatics Chaper3
No ratings yet
Bioinformatics Chaper3
34 pages
Bioinformatics Sequence Alignment Guide
No ratings yet
Bioinformatics Sequence Alignment Guide
47 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
Sequence Alignment
No ratings yet
Sequence Alignment
25 pages
Sequence Comparison Part 1
No ratings yet
Sequence Comparison Part 1
31 pages
Bioinformatics MSC
No ratings yet
Bioinformatics MSC
85 pages
5.pairwise Alignment
No ratings yet
5.pairwise Alignment
85 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Evolutionary Basis of Sequence Alignment
No ratings yet
Evolutionary Basis of Sequence Alignment
26 pages
Understanding Sequence Alignment in Bioinformatics
No ratings yet
Understanding Sequence Alignment in Bioinformatics
13 pages
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
No ratings yet
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
59 pages
Sequence Alignment in Bioinformatics
No ratings yet
Sequence Alignment in Bioinformatics
61 pages
Sequence Alignment for Bioinformatics
No ratings yet
Sequence Alignment for Bioinformatics
51 pages
Disclaimer
No ratings yet
Disclaimer
22 pages
Lecture1 Loi
No ratings yet
Lecture1 Loi
52 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
Sequence Alignment
No ratings yet
Sequence Alignment
36 pages
Lec 02
No ratings yet
Lec 02
103 pages
Msa MTech
No ratings yet
Msa MTech
17 pages
Sequence Alignment Basics
No ratings yet
Sequence Alignment Basics
27 pages
Bioinfo Ders 7 ALLIGNMENT - 1
No ratings yet
Bioinfo Ders 7 ALLIGNMENT - 1
55 pages
02 Sequence Alignment
No ratings yet
02 Sequence Alignment
43 pages
Sequence Alignment
No ratings yet
Sequence Alignment
63 pages
Sequence Alignment
No ratings yet
Sequence Alignment
24 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
B.I Sec 4.
No ratings yet
B.I Sec 4.
18 pages
Bioinformatics Sequence Analysis
No ratings yet
Bioinformatics Sequence Analysis
23 pages
BLAST and Sequence Alignment
No ratings yet
BLAST and Sequence Alignment
36 pages
Gene Sequence Analysis Guide
No ratings yet
Gene Sequence Analysis Guide
14 pages
Week 3
No ratings yet
Week 3
42 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
54 pages
AsBioinfo Ders 7 ALLIGNMENT - 1
No ratings yet
AsBioinfo Ders 7 ALLIGNMENT - 1
9 pages
Global vs Local Sequence Alignment
No ratings yet
Global vs Local Sequence Alignment
77 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
Chapter 2 Bioinformatics
No ratings yet
Chapter 2 Bioinformatics
9 pages
Bioinformatics 2
No ratings yet
Bioinformatics 2
35 pages
Alignment
No ratings yet
Alignment
58 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Genomics & Proteomics Overview
No ratings yet
Genomics & Proteomics Overview
89 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Bioinformatics Sequence Alignments
No ratings yet
Bioinformatics Sequence Alignments
37 pages
Sequence Alignment Algorithms Overview
75% (4)
Sequence Alignment Algorithms Overview
37 pages
5 Sequence Alignment
No ratings yet
5 Sequence Alignment
21 pages
Genomic Sequence Alignment
No ratings yet
Genomic Sequence Alignment
25 pages
Sequence Alignment Techniques
No ratings yet
Sequence Alignment Techniques
69 pages
Database Similarity Searching: Irit Orr Shifra Ben Dor
No ratings yet
Database Similarity Searching: Irit Orr Shifra Ben Dor
76 pages
Lab Report 1 - Mic102
No ratings yet
Lab Report 1 - Mic102
10 pages
Reported Speech 3
No ratings yet
Reported Speech 3
2 pages
Air Conditioning Health and Safety
100% (1)
Air Conditioning Health and Safety
8 pages
Business Associations Outline: Agency Law
No ratings yet
Business Associations Outline: Agency Law
24 pages
Summary Meatless Days
No ratings yet
Summary Meatless Days
7 pages
Strategies of Conflict Management: Prepared by Sushma
No ratings yet
Strategies of Conflict Management: Prepared by Sushma
19 pages
Early Human History & Stone Age
No ratings yet
Early Human History & Stone Age
2 pages
4th Grade Long Division Lesson Plan
No ratings yet
4th Grade Long Division Lesson Plan
7 pages
Legal Response to GST Dispute Notice
No ratings yet
Legal Response to GST Dispute Notice
6 pages
Shaen Corbet, Et Al. (2018) - Cryptocurrencies As A Financial Asset: A Systematic Analysis. Finana.
No ratings yet
Shaen Corbet, Et Al. (2018) - Cryptocurrencies As A Financial Asset: A Systematic Analysis. Finana.
58 pages
40k Special Characters
No ratings yet
40k Special Characters
4 pages
The Ancient Spice Trade
No ratings yet
The Ancient Spice Trade
5 pages
Efficacy of Phonophoresis and Foam Roller Stretching Along With Intrinsic Muscle Activities in Reducing Pain and Improving Function Among Patients With Plantar Fascitis
100% (1)
Efficacy of Phonophoresis and Foam Roller Stretching Along With Intrinsic Muscle Activities in Reducing Pain and Improving Function Among Patients With Plantar Fascitis
5 pages
A Textbook Analysis of Critical Readings in English - Thesis - Raut 2020
100% (1)
A Textbook Analysis of Critical Readings in English - Thesis - Raut 2020
57 pages
MPDF
No ratings yet
MPDF
3 pages
A Closed Door That Leaves Us Guessing
No ratings yet
A Closed Door That Leaves Us Guessing
17 pages
Possession Close Relationship: (Nick S Pen, My Brother S Ball) or Another
No ratings yet
Possession Close Relationship: (Nick S Pen, My Brother S Ball) or Another
2 pages
MAPEH 10 Curriculum Overview
100% (1)
MAPEH 10 Curriculum Overview
15 pages
Anointing PPT
No ratings yet
Anointing PPT
20 pages
Curriculum Content
No ratings yet
Curriculum Content
4 pages
EFAL and EHL GR 8 and 9 Literature Pacesetter 2025
100% (1)
EFAL and EHL GR 8 and 9 Literature Pacesetter 2025
5 pages
IB Entrance Exam 2005
No ratings yet
IB Entrance Exam 2005
9 pages
Kindergarten Lesson on Smell
No ratings yet
Kindergarten Lesson on Smell
3 pages
Legal Reasoning: Application Sheet-2020: Legal Caselets-1
No ratings yet
Legal Reasoning: Application Sheet-2020: Legal Caselets-1
7 pages
Soal Bahasa Inggris Kelas 10 SMK
No ratings yet
Soal Bahasa Inggris Kelas 10 SMK
6 pages
Cystoscopy Procedure Overview and Care
100% (1)
Cystoscopy Procedure Overview and Care
19 pages
DLL-Math 9 Quarter 1week 2 SY 2023-2024
No ratings yet
DLL-Math 9 Quarter 1week 2 SY 2023-2024
9 pages
EBS 349 - SCH MGT & Admi - MS
No ratings yet
EBS 349 - SCH MGT & Admi - MS
4 pages

Sequence Alignment and Homology Analysis

Uploaded by

Sequence Alignment and Homology Analysis

Uploaded by

Sequences alignments (similarity & homology)

P. Thbault- Alignment sequence

The sequence is represented through a given format

P. Thbault- Alignment sequence

Sequence analysis, what for?

Sequence analysis, what for?

Databases Sequence Homologous sequences search Pairwise alignment

" Specialized sites

Information inference and evolution

P. Thbault- Alignment sequence

spciation time P1 P2 duplication P1

fonction F (deduced from homology)

What is the Function of P1?

function F (experimental work)

Ex. 1: trypsin, human & chiken (~80 % id.)

Ex. 2: trypsin, human & mosquito (~30 % id.)

the trypsin case

P. Thbault- Alignment sequence

P. Thbault- Alignment sequence

Modular composition of proteins

Example of modular proteins

P. Thbault- Alignment sequence

How to compare 2 sequences?

P. Thbault- Alignment sequence

!dot matrix! view

P. Thbault- Alignment sequence

Dot - Matrix Protein 1

Matrix of substitution Opening Extending

Matrix of substitutions (aa)

Matrix of substitutions (aa)

global : total length

Smith & Waterman Blast

Comparing a sequence with those of a databases

Blast Hit evaluation

" " "

But also : size, %id, %gap = to examin alignements

P. Thbault- Alignment sequence

You might also like