0% found this document useful (0 votes)

78 views28 pages

Hidden Markov Models in Bioinformatics

Hidden Markov models (HMMs) are a machine learning technique that can be applied to problems in bioinformatics. HMMs were originally developed in the 1960s and were first applied to speech recognition in the 1970s. In the 1980s, researchers began applying HMMs to analyze biological sequences like DNA. HMMs allow researchers to build probabilistic models of sequence data that take into account the hidden structure or labels associated with the sequences. In bioinformatics, HMMs can be used to identify features like genes and splice sites by modeling the emission probabilities of bases in exons, introns, and splice sites. The Viterbi algorithm can then be applied to HMMs to determine the most likely hidden state path that

Uploaded by

asher

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views28 pages

Hidden Markov Models in Bioinformatics

Uploaded by

asher

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Applying Hidden Markov Models to

Bioinformatics

Conor Buckley
Outline
What are Hidden Markov Models?
Why are they a good tool for Bioinformatics?
Applications in Bioinformatics
History of Hidden Markov Models

HMM were first described in a series of statistical

papers by Leonard E. Baum and other authors in the
second half of the 1960s. One of the first applications
of HMMs was speech recogniation, starting in the
mid-1970s. They are commonly used in speech
recognition systems to help to determine the words
represented by the sound wave forms captured
In the second half of the 1980s, HMMs began to be
applied to the analysis of biological sequences, in
particular DNA.
Since then, they have become ubiquitous in
Source: http://en.wikipedia.org/wiki/Hidden_Markov_model#History
What are Hidden Markov Models?

HMM: A formal foundation for making probabilistic

models of linear sequence 'labeling' problems.
They provide a conceptual toolkit for building
complex models just by drawing an intuitive picture.

Source: http://www.nature.com/nbt/journal/v22/n10/full/nbt1004-1315.html#B1
What are Hidden Markov Models?
Machine learning approach in bioinformatics
Machine learning algorithms are presented with
training data, which are used to derive important
insights about the (often hidden) parameters.
Once an algorithm has been trained, it can apply these
insights to the analysis of a test sample
As the amount of training data increases, the accuracy
of the machine learning algorithm typically increasess
as well.

Source: http://www.nature.com/nbt/journal/v22/n10/full/nbt1004-1315.html#B1
Hidden Markov Models
Has N states, called S1, S2, ... Sn
There are discrete timesteps, t=0, t=1

N=3
t=0 S2

S1
S3

Source:
http://www.autonlab.org/tutorials/hmm.html
Hidden Markov Models
Has N states, called S1, S2, ... Sn
There are discrete timesteps, t=0, t=1
For each timestep, the system is in exactly one
of the available states.
N=3
t=0 S2

S1
S3
Hidden Markov Models

S1 S2 S3

Bayesian network with time slices

Bayesian Network Image:

http://en.wikipedia.org/wiki/File:Hmm_temporal_bayesian_net.svg
A Markov Chain

Bayes' Theory
• (statistics) a theorem describing how the conditional probability of a set
of possible causes for a given observed event can be computed from
knowledge of the probability of each cause and the conditional
probability of the outcome of each cause

- http://wordnetweb.princeton.edu/perl/webwn?s=bayes%27%20theorem
Building a Markov Chain
Concrete Example
 Two friends, Alice and Bob, who live far apart from each other and who talk
together daily over the telephone about what they did that day.
 Bob is only interested in three activities: walking in the park, shopping, and cleaning
his apartment.
 The choice of what to do is determined exclusively by the weather on a given day.
 Alice has no definite information about the weather where Bob lives, but she knows
general trends.
 Based on what Bob tells her he did each day, Alice tries to guess what the weather
must have been like.
 Alice believes that the weather operates as a discrete Markov chain. There are two
states, "Rainy" and "Sunny", but she cannot observe them directly, that is, they are
hidden from her.
 On each day, there is a certain chance that Bob will perform one of the following
activities, depending on the weather: "walk", "shop", or "clean". Since Bob tells
Alice about his activities, those are the observations.

Source: Wikipedia.org
Hidden Markov Models
Building a Markov Chain
What now?
* Find out the most probable output sequence
Vertibi's algorithm
Dynamic programming algorithm for finding the most
likely sequence of hidden states – called the Vertibi path
– that results in a sequence of observed events.
http://pcarvalho.com/forward_viterbi/
Vertibi Results
Bioinformatics Example
Assume we are given a DNA sequence that begins in
an exon, contains one 5' splice site and ends in an
intron
Identify where the switch from exon to intron occurs
Where is the splice site??

Sourece: http://www.nature.com/nbt/journal/v22/n10/full/nbt1004-1315.html#B1
Bioinformatics Example
In order for us to guess, the sequences of exons, splice
sites and introns must have different statistical
properties.
Let's say...
Exons have a uniform base composition on average
 A/C/T/G: 25% for each base
Introns are A/T rich
 A/T: 40% for each
 C/G: 10% for each
5' Splice site consensus nucleotide is almost always a
G...
 G: 95%
 A: 5%
Sourece: http://www.nature.com/nbt/journal/v22/n10/full/nbt1004-1315.html#B1
Bioinformatics Example
We can build an Hidden Markov Model
We have three states
"E" for Exon
"5" for 5' SS
"I" for Intron
Each State has its own emission probabilities which
model the base composition of exons, introns and
consensus G at the 5'SS
Each state also has transition probabilities (arrows)

 We can use HMMs to generate a sequence

 When we visit a state, we emit a nucleotide bases on the emission
probability distribution
 We also choose a state to visit next according to the state's
transition probability distribution.
 We generate two strings of information
 Observed Sequence
 Underlying State Path

 The state path is a Markov Chain

 Since we're only given the observed sequence, this underlying
state path is a hidden Markov Chain
Therefore...
 We can apply Bayesian Probability

S – Observed sequence
π – State Path
Θ – Parameters
The probability P(S, π|HMM, Θ) is the product of all emission
probabilites and transition probilities.

Lets look at an example...

 There are 27 transitions and 26 emissions.

 Multiply all 53 probabilities together (and take the log, since these are
small numbers) and you'll calculate log P(S, π|HMM, Θ) = -41.22

 The model parameters and overall sequences scores are all

probabilities
 Therefore we can use Bayesian probability theory to manipulate these
numbers in standard, powerful ways, including optimizing parameters and
interpreting the signifigance of scores.

Posterior Decoding:
 An alternative state path where the SS falls on the 6 th G instead of the 5th (log
probabilities of -41.71 versus -41.22)
 How confident are we that the fifth G is the right choice?

We can calculate our confidence directly.

 The probability that nucleotide i was emitted by state k is the sum of the probabilities of
all the states paths use state k to generate i, normalized by the sum over all possible state
paths
Result: We get a probability of 46% that the best-scoring fifth G is correct and 28% that the
sixth G position is correct.

Source: http://www.nature.com/nbt/journal/v22/n10/full/nbt1004-1315.html#B1
Further Possibilites
The toy-model provided by the article is a simple
example
But we can go further, we could add a more realistic
consensus GTRAGT at the 5' splice site
We could put a row of six HMM states in place of '5'
state to model a six-base ungapped consensus motif
Possibilities are not limited
The catch
HMM don't deal well with correlations between
nucleotides
Because they assume that each emitted nucleotide
depends only on one underlying state.
Example of bad use for HMM:
Conserved RNA base pairs which induce long-range
pairwise correlations; one position might be any
nucleotide but the base-paired partner must be
complementary.
An HMM state path has no way of 'remembering' what a
distant state generated.

Source: http://www.nature.com/nbt/journal/v22/n10/full/nbt1004-1315.html#B1
Credits
http://
www.nature.com/nbt/journal/v22/n10/full/nbt1004-131
5.html#B1
http://en.wikipedia.org/wiki/Viterbi_algorithm
http://en.wikipedia.org/wiki/Hidden_Markov_model
http://en.wikipedia.org/wiki/Bayesian_network
http://www.daimi.au.dk/~bromille/PHM/Storm.pdf
Questions?

HMMs in Biological Sequence Analysis
No ratings yet
HMMs in Biological Sequence Analysis
30 pages
HMMs for AI & Web Data Extraction
No ratings yet
HMMs for AI & Web Data Extraction
34 pages
Hidden Markov Models in Bioinformatics
No ratings yet
Hidden Markov Models in Bioinformatics
4 pages
HMMs and Bayes Nets
No ratings yet
HMMs and Bayes Nets
34 pages
An Introduction To Hidden Markov Models
No ratings yet
An Introduction To Hidden Markov Models
10 pages
HMMs for Gene Finding in Bioinformatics
No ratings yet
HMMs for Gene Finding in Bioinformatics
32 pages
HMM in BI
No ratings yet
HMM in BI
37 pages
nbt1004 1315 PDF
No ratings yet
nbt1004 1315 PDF
2 pages
Lecture 8: State-Space Models Based On Slides By: Probabilis C Graphical Models
No ratings yet
Lecture 8: State-Space Models Based On Slides By: Probabilis C Graphical Models
29 pages
1999-Modelling Gene Expression Data Using Dynamic Bayesian Networks
No ratings yet
1999-Modelling Gene Expression Data Using Dynamic Bayesian Networks
12 pages
Learning Algorithms in AI Explained
No ratings yet
Learning Algorithms in AI Explained
67 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
46 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
15 pages
Labman 2
No ratings yet
Labman 2
16 pages
Binfo (HMM)
No ratings yet
Binfo (HMM)
16 pages
CG 10 402 PDF
No ratings yet
CG 10 402 PDF
14 pages
Markov Models
No ratings yet
Markov Models
54 pages
Chapter4.4 HMM
No ratings yet
Chapter4.4 HMM
20 pages
Hidden Markov Model Introduction
No ratings yet
Hidden Markov Model Introduction
36 pages
A Hidden Markov Model
No ratings yet
A Hidden Markov Model
6 pages
Lec18 HMMs
No ratings yet
Lec18 HMMs
56 pages
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
No ratings yet
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
55 pages
Markov Models in AI Applications
No ratings yet
Markov Models in AI Applications
78 pages
NLP Lecture 01-10-Hmm
No ratings yet
NLP Lecture 01-10-Hmm
9 pages
Slides
No ratings yet
Slides
69 pages
HMM-Mona Singh
No ratings yet
HMM-Mona Singh
11 pages
Introduction To Hidden Markov Models
No ratings yet
Introduction To Hidden Markov Models
56 pages
Hidden Markov Models in Bioinformatics
No ratings yet
Hidden Markov Models in Bioinformatics
11 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
10 pages
Hidden Markov Model (HMM) Architecture
No ratings yet
Hidden Markov Model (HMM) Architecture
15 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
19 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
11 pages
L4 Tagging
No ratings yet
L4 Tagging
107 pages
Hidden Markov Models for ML Students
No ratings yet
Hidden Markov Models for ML Students
5 pages
ML Unit 5
No ratings yet
ML Unit 5
65 pages
Module 4.2
No ratings yet
Module 4.2
42 pages
Unit 4 Full PPT (ML)
No ratings yet
Unit 4 Full PPT (ML)
31 pages
HMMs for Genomic Sequence Analysis
No ratings yet
HMMs for Genomic Sequence Analysis
59 pages
08 Graphical Models
No ratings yet
08 Graphical Models
61 pages
Hidden Markov Models for CS Students
No ratings yet
Hidden Markov Models for CS Students
35 pages
SP14 CS188 Lecture 14 - Hidden Markov Models - Print
No ratings yet
SP14 CS188 Lecture 14 - Hidden Markov Models - Print
26 pages
Applications of PGMs
No ratings yet
Applications of PGMs
4 pages
Applications of Hidden Markov Model Stat-1
No ratings yet
Applications of Hidden Markov Model Stat-1
8 pages
Jahmm 0.6.1 Userguide
No ratings yet
Jahmm 0.6.1 Userguide
23 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
16 pages
Artificial Intelligence and Learning Algorithms: Presented by Brian M. Frezza 12/1/05
No ratings yet
Artificial Intelligence and Learning Algorithms: Presented by Brian M. Frezza 12/1/05
67 pages
Cis262 HMM
No ratings yet
Cis262 HMM
34 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
51 pages
Hidden Markovnikov Model
No ratings yet
Hidden Markovnikov Model
32 pages
Machine Learning Course Syllabus GIT
No ratings yet
Machine Learning Course Syllabus GIT
28 pages
Lecture 7
No ratings yet
Lecture 7
25 pages
Introduction to Hidden Markov Models
No ratings yet
Introduction to Hidden Markov Models
30 pages
1.1. An Example of A HMM For Protein Sequences: Output Prob
No ratings yet
1.1. An Example of A HMM For Protein Sequences: Output Prob
16 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
35 pages
Bluetooth Hacking: Risks and Countermeasures
No ratings yet
Bluetooth Hacking: Risks and Countermeasures
20 pages
Introduction To Algorithms
100% (1)
Introduction To Algorithms
214 pages
Backtrack Algorithms for Polyhedron Enumeration
No ratings yet
Backtrack Algorithms for Polyhedron Enumeration
13 pages
Unix 9
No ratings yet
Unix 9
6 pages
Intro To C++ Language: Scalar Variables, Operators and Control Structures
No ratings yet
Intro To C++ Language: Scalar Variables, Operators and Control Structures
71 pages
Scott Flansburg - Mega Math - Workbook - Turn On The Human Calculator in You
No ratings yet
Scott Flansburg - Mega Math - Workbook - Turn On The Human Calculator in You
16 pages
Love Letter
No ratings yet
Love Letter
7 pages
ASTM B466 (2009) - Standard Specification For Seamless Copper-Nickel Pipe and Tube
100% (1)
ASTM B466 (2009) - Standard Specification For Seamless Copper-Nickel Pipe and Tube
6 pages
Solubility Product & PH
No ratings yet
Solubility Product & PH
5 pages
Punehod DTF-A3 Mamual-English-1
100% (1)
Punehod DTF-A3 Mamual-English-1
28 pages
Mechanical Engineering Thesis
50% (2)
Mechanical Engineering Thesis
97 pages
Literature Review Line Follower Robot
100% (1)
Literature Review Line Follower Robot
20 pages
3SB32010AA11 Datasheet en PB
No ratings yet
3SB32010AA11 Datasheet en PB
6 pages
Sony Fda-Ev1s Ver.1.0 SM
No ratings yet
Sony Fda-Ev1s Ver.1.0 SM
8 pages
Amicon Ultra-4 10K Centrifugal Filter Devices: User Guide
No ratings yet
Amicon Ultra-4 10K Centrifugal Filter Devices: User Guide
10 pages
Mock Test Grade 7
0% (1)
Mock Test Grade 7
5 pages
Passive Support
No ratings yet
Passive Support
49 pages
Aqua Broadcast Cobalt c100 Datasheet 271123
No ratings yet
Aqua Broadcast Cobalt c100 Datasheet 271123
7 pages
Acid/Base Indicator Lab Instructions
No ratings yet
Acid/Base Indicator Lab Instructions
6 pages
Tooth Morphology Basics
No ratings yet
Tooth Morphology Basics
86 pages
Weekend Sales and Connection Tips
No ratings yet
Weekend Sales and Connection Tips
32 pages
Centre of Mass, Torque & Angular Momentum
No ratings yet
Centre of Mass, Torque & Angular Momentum
18 pages
The History of Phycics From Antiquity To The Enlightenment
No ratings yet
The History of Phycics From Antiquity To The Enlightenment
30 pages
Velocity and Acceleration Activity
No ratings yet
Velocity and Acceleration Activity
4 pages
Vlsi Unit 4 Notes
No ratings yet
Vlsi Unit 4 Notes
20 pages
C1 RC The Longest Hour
No ratings yet
C1 RC The Longest Hour
2 pages
Case Study - GM and The Great Automation Solution
No ratings yet
Case Study - GM and The Great Automation Solution
4 pages
Testing Rate at RUET 5-9-18
100% (4)
Testing Rate at RUET 5-9-18
5 pages
Dynamics Newton's Laws of Motion - Part 1
No ratings yet
Dynamics Newton's Laws of Motion - Part 1
27 pages
BattleTech Record Sheets (Flechs) : Devastator
No ratings yet
BattleTech Record Sheets (Flechs) : Devastator
7 pages
A Lesson For This Sunday by Derek Walcott
No ratings yet
A Lesson For This Sunday by Derek Walcott
2 pages
Solenoides Ss Series Parker
No ratings yet
Solenoides Ss Series Parker
34 pages
Sagemcom F@st 5655v2 AC RF Specs
No ratings yet
Sagemcom F@st 5655v2 AC RF Specs
7 pages
Greek Mythology: Dragon Combat Analysis
No ratings yet
Greek Mythology: Dragon Combat Analysis
13 pages
2020 Monthly Revenue and Order Analysis
No ratings yet
2020 Monthly Revenue and Order Analysis
1,087 pages
Drug Dispensing Practices: Dr. Dinesh Kumar Meena, Pharm.D
100% (1)
Drug Dispensing Practices: Dr. Dinesh Kumar Meena, Pharm.D
40 pages
Soal UAS Advanced Listening Genap 2021
No ratings yet
Soal UAS Advanced Listening Genap 2021
3 pages

Hidden Markov Models in Bioinformatics

Uploaded by

Hidden Markov Models in Bioinformatics

Uploaded by

Applying Hidden Markov Models to

HMM were first described in a series of statistical

HMM: A formal foundation for making probabilistic

Bayesian network with time slices

Bayesian Network Image:

 We can use HMMs to generate a sequence

 The state path is a Markov Chain

Lets look at an example...

 There are 27 transitions and 26 emissions.

 The model parameters and overall sequences scores are all

We can calculate our confidence directly.

You might also like