0% found this document useful (0 votes)

209 views25 pages

DeepBind: Genomics Data Analysis

DeepBind is a deep learning model that learns sequence specificities from large datasets of protein binding sequences. It generates Position Weight Matrix models for over 500 transcription factors and 194 RNA binding proteins. DeepBind outperforms other non-deep learning methods on tasks like predicting DNA and RNA binding. It can also identify variants that affect protein binding and help understand alternative splicing regulation. Overall, DeepBind demonstrates the ability of deep learning to discover sequence motifs from huge amounts of protein binding data.

Uploaded by

Oky Hermansyah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

209 views25 pages

DeepBind: Genomics Data Analysis

Uploaded by

Oky Hermansyah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

DeepBind

6.874 - Pranam Chatterjee

Why do we care?
● Regulatory processes
○ Transcription
○ Alternative Splicing
○ Disease correlation

● Sequence specificity
Position Weight Matrix
Steps:
1. Get PFM by counting
occurrences of each
nucleotide at each
position.
2. Divide frequency by total
# of sequences.
3. Formally, given a set X of
N aligned sequences of
length i:
Data Issues
● Different forms of data
○ Specifity coefficient
■ Protein Binding Microarrays
■ RNAcompete arrays
○ Ranked Lists of Bound Sequences
■ ChIP-Seq
○ High Affinity Sequence List
■ HT-SELEX
● Large Quantities of Data
○ 10,000-100,000 sequences (1 EXPERIMENT)
● Additional Biases/Limitations
○ i.e., hyper-ChIPable regions of genome
○ Need to filter
DeepBind Claims
● Apply to both microarray and sequencing data
● Generalize well across technologies
● Tolerate noise and mislabeled data
● Can learn from millions of sequences through parallel implementation on a
graphics processing unit (GPU)
● Train models and tune parameters automatically
● Can discover new patterns without location information

Alipanahi, et al., Nature Biotechnology, 2015.

Overview of DeepBind

Alipanahi, et al., Nature Biotechnology, 2015.

Training Procedure

MAX or MEAN

BINDING SCORE
Alipanahi, et al., Nature Biotechnology, 2015.
Calibration and Testing Procedure

12 terabases of data!!!
Alipanahi, et al., Nature Biotechnology, 2015.
Let’s unpack that...
● Thousands of PBM, RNAcompete, ChIP-Seq, and HT-SELEX experiments
● Create 927 DeepBind models
● 538 Transcription Factors
● 194 RNA-binding Proteins (RBPs)

(This took 4+ years, btw)

Alipanahi, et al., Nature Biotechnology, 2015.

How well does it work?
● Test on PBM data from DREAM5 TF-DNA Motif Recognition Challenge
● 86 different mouse transcription factors
● 2 array designs (~40,000 probes each)
○ All possible 10-mers, non-palindromic 8-mers (32x)
● Train on probe intensities, predict on held-out test array design

Example Competing Algorithms (26 in total)

● FeatureREDUCE (biophysical PWM/k-mer) None of these are deep-learning-based!

● BEEML-PBM (weighted regression)
● RankMotif++ (probabilistic)
● PFM models (position frequency matrices)
Alipanahi, et al., Nature Biotechnology, 2015.
Metrics
● Area Under Curve (AUC)
○ Measures true positive rate of model as a function of
false positive rate (ROC curve)
○ Tells us how good the model identifies actual positives
○ Higher AUC means better performing model

● Pearson Correlation
○ Measures linear correlation between predicted intensity
and probe intensities
○ Higher absolute values (maxed at 1), indicate better
performing mode.
Quantitative Performance Against Other Methods

Alipanahi, et al., Nature Biotechnology, 2015.

Do in vitro models accurately identify in vivo bound
sequences?
● 506 ENCODE ChIP-Seq data sets
● In vivo laboratory biases
○ Cell-type specificities
○ Nucleosome interactions
○ Chromatin remodeling, etc.
● 137 transcription factors
● Performed better than other
non-deep learning methods based
on AUC
● Can generalize to other data
acquisition methods
Alipanahi, et al., Nature Biotechnology, 2015.
First place goes to….DEEPBIND!
Why are RBPs sequence specifities difficult to
predict?
● Usually bind to ssRNA
● More flexible than DNA
● Can fold into stable secondary structures
● Recognition motif is highly flexible
○ Multiple domains neeeded for binding
● RNA structure also affects binding

Rhee, et al., Nucleic Acids Research 2008,

Identifying Damaging
Genetic Variants

● How to do this?
● MUTATION MAPS!
○ Importance of each base
○ Effect of each mutation on
binding score
● Illustrates effect of point
mutations on binding
affinity

Alipanahi, et al., Nature Biotechnology, 2015.

Mutation in MYC Enhancer Weakens TCF7L2
Binding Site

Alipanahi, et al., Nature Biotechnology, 2015.

SNP in Globin Cluster Creates GATA1 Binding Site

Alipanahi, et al., Nature Biotechnology, 2015.

DeepFind: an aggregate model
● What’s the point? To provide
collective contexts.
● I.e., true TF binding sites are
likely to be located with other
TF binding sites
● AUC ~ 0.76
● Predicts deleterious SNVs in
promoters

Alipanahi, et al., Nature Biotechnology, 2015.

One more application: Alternative Splicing
● AS generates transcriptional
diversity
● RBPs regulate splicing
● Binding scores at exon
junctions regulated by splicing
regulators
● Consistent with experimental
CLIP-seq data and known
binding profiles of RBP’s

Alipanahi, et al., Nature Biotechnology, 2015.

Prediction of Nova Regulation Mechanism
DeepBind Motif Learning
Key Takeaways
● GOAL: given regions experimentally determined to be bound by proteins,
what is the model describing bound sequences?
● Sequences/Binding Scores -> CNN -> binding scores for novel sequences
● Generates weighted ensembles of PWM’s and mutational maps
● ~600 different DeepBind models generated
● Identified RNA-binding sites involved in splicing regulation
● Identified disease-associated variants that affect TF binding

CHECK IT OUT YOURSELF: [Link]

Shortcomings and Future Work
● Comparisons with only non-deep learning models
● Not much better than non-deep learning models
● Assumes one motif in each probe
● Non-coding factors/variants ignored
● Does not account for positional dynamics of probe sequences -> DeeperBind
● How about epigenetic regulation of binding to sequences? -> DeepSEA

Cool name bro

Any Questions?

DLPPT
No ratings yet
DLPPT
14 pages
Deep Learning for Regulatory Site Prediction
No ratings yet
Deep Learning for Regulatory Site Prediction
16 pages
Genome-Wide DNA/RNA Binding Site Prediction
No ratings yet
Genome-Wide DNA/RNA Binding Site Prediction
9 pages
Convolutional Neural Network Architectures For Predicting DNA-Protein Binding
No ratings yet
Convolutional Neural Network Architectures For Predicting DNA-Protein Binding
8 pages
DeepLBS: Accurate Ligand-Binding Site Prediction
No ratings yet
DeepLBS: Accurate Ligand-Binding Site Prediction
4 pages
WVDL: Deep Learning for RNA-Protein Binding
No ratings yet
WVDL: Deep Learning for RNA-Protein Binding
7 pages
TFBS-Finder: Deep Learning for TFBS Prediction
No ratings yet
TFBS-Finder: Deep Learning for TFBS Prediction
9 pages
Machine Learning in Bioinformatics
No ratings yet
Machine Learning in Bioinformatics
34 pages
Deeplearning Survery
No ratings yet
Deeplearning Survery
10 pages
BS6204 Deep Learning For Biomedical Science (Lecture 6) DNA RNA Protein
No ratings yet
BS6204 Deep Learning For Biomedical Science (Lecture 6) DNA RNA Protein
51 pages
Overview On Bioinformatics
No ratings yet
Overview On Bioinformatics
75 pages
DeepGRN Prediction of Transcription Factor Binding PDF
No ratings yet
DeepGRN Prediction of Transcription Factor Binding PDF
27 pages
ML in Drug Binding: Data & Challenges
No ratings yet
ML in Drug Binding: Data & Challenges
18 pages
A Sequence Based Multiple Kernel Model For Identifying DNA Binding Proteins
No ratings yet
A Sequence Based Multiple Kernel Model For Identifying DNA Binding Proteins
17 pages
Paper1 Decodingdlforbinfdingaffinit1y
No ratings yet
Paper1 Decodingdlforbinfdingaffinit1y
32 pages
DeepFinder: DNA Motif Discovery Tool
No ratings yet
DeepFinder: DNA Motif Discovery Tool
11 pages
DeepFinder An Integration of Feature-Based and Deep Learning Approach For DNA Motif Discovery
No ratings yet
DeepFinder An Integration of Feature-Based and Deep Learning Approach For DNA Motif Discovery
11 pages
Geometric Deep Learning of protein-DNA Binding Specificity: Nature Methods
No ratings yet
Geometric Deep Learning of protein-DNA Binding Specificity: Nature Methods
15 pages
Deep Learning in Bioinformatics PDF
No ratings yet
Deep Learning in Bioinformatics PDF
18 pages
Sequence-Only Prediction of Binding Affinity Changes: A Robust and Interpretable Model For Antibody Engineering
No ratings yet
Sequence-Only Prediction of Binding Affinity Changes: A Robust and Interpretable Model For Antibody Engineering
8 pages
Structure-Based, Deep-Learning Models For Protein-Ligand Binding Affinity Prediction
No ratings yet
Structure-Based, Deep-Learning Models For Protein-Ligand Binding Affinity Prediction
15 pages
Predicting Nucleic-Acid-Binding Proteins
No ratings yet
Predicting Nucleic-Acid-Binding Proteins
10 pages
Antibody Optimization Enabled by Artificial Intelligence Predictions of Binding Affinity and Naturalness
No ratings yet
Antibody Optimization Enabled by Artificial Intelligence Predictions of Binding Affinity and Naturalness
39 pages
Genomic Benchmarks: A Collection of Datasets For Genomic Sequence Classification
No ratings yet
Genomic Benchmarks: A Collection of Datasets For Genomic Sequence Classification
9 pages
NextComp2017 Paper 24
No ratings yet
NextComp2017 Paper 24
7 pages
Acs Molpharmaceut 6b00248
No ratings yet
Acs Molpharmaceut 6b00248
7 pages
Machine Learning for Binding Affinity
No ratings yet
Machine Learning for Binding Affinity
26 pages
Deep Learning in DNA Sequence Analysis
No ratings yet
Deep Learning in DNA Sequence Analysis
11 pages
Exploration of Protein Sequence Embeddings For Protein-Ligand Binding Site Detection
No ratings yet
Exploration of Protein Sequence Embeddings For Protein-Ligand Binding Site Detection
6 pages
MCNN for RNA-Protein Binding Prediction
No ratings yet
MCNN for RNA-Protein Binding Prediction
8 pages
An Efficient Deep Learning Approach For DNA Binding Proteins Classification From Primary Sequences
No ratings yet
An Efficient Deep Learning Approach For DNA Binding Proteins Classification From Primary Sequences
14 pages
Zhou 和 Troyanskaya - 2015 - Predicting effects of noncoding variants with deep
No ratings yet
Zhou 和 Troyanskaya - 2015 - Predicting effects of noncoding variants with deep
8 pages
Structural
No ratings yet
Structural
4 pages
Deep Learning: New Computational Modelling Techniques For Genomics
No ratings yet
Deep Learning: New Computational Modelling Techniques For Genomics
15 pages
A Comprehensive Survey of Deep Learning Techniques in Protein Function Prediction
No ratings yet
A Comprehensive Survey of Deep Learning Techniques in Protein Function Prediction
11 pages
Seq2Seq Fingerprint
No ratings yet
Seq2Seq Fingerprint
10 pages
AutoGenome: AutoML for Genomic Research
No ratings yet
AutoGenome: AutoML for Genomic Research
11 pages
Site2Vec: Protein-Ligand Binding Analysis
No ratings yet
Site2Vec: Protein-Ligand Binding Analysis
19 pages
Research Poster
No ratings yet
Research Poster
1 page
Neural Networks for Protein Function Prediction
No ratings yet
Neural Networks for Protein Function Prediction
12 pages
Machine Learning Data Representation Insights
No ratings yet
Machine Learning Data Representation Insights
100 pages
SVM for Protein-Ligand Binding Prediction
No ratings yet
SVM for Protein-Ligand Binding Prediction
13 pages
2019 PLoS Comput Biol 15 E1006718
No ratings yet
2019 PLoS Comput Biol 15 E1006718
23 pages
Deep Learning in Healthcare Informatics
No ratings yet
Deep Learning in Healthcare Informatics
14 pages
DeepPFP - A Multi-task-Aware Architecture For Protein Function Prediction
No ratings yet
DeepPFP - A Multi-task-Aware Architecture For Protein Function Prediction
10 pages
Agga Rwal 2021
No ratings yet
Agga Rwal 2021
11 pages
A Universal SNP and Small-Indel Variant Caller Using Deep Neural Networks
No ratings yet
A Universal SNP and Small-Indel Variant Caller Using Deep Neural Networks
6 pages
BionoiNet: Deep Learning for Ligand Binding
No ratings yet
BionoiNet: Deep Learning for Ligand Binding
7 pages
Annotating Protein Functions Via Fusing Multiple Biological Modalities
No ratings yet
Annotating Protein Functions Via Fusing Multiple Biological Modalities
13 pages
Abstract
No ratings yet
Abstract
1 page
The Bioinformatics Toolbox Extends MATLAB
No ratings yet
The Bioinformatics Toolbox Extends MATLAB
19 pages
Enhanced Deep Learning for Protein Binding Prediction
No ratings yet
Enhanced Deep Learning for Protein Binding Prediction
8 pages
Machine Learning in Bioinformatics Insights
No ratings yet
Machine Learning in Bioinformatics Insights
3 pages
Tools for Protein Variant Analysis
No ratings yet
Tools for Protein Variant Analysis
1 page
Bioinformatics File Formats Overview
No ratings yet
Bioinformatics File Formats Overview
13 pages
Dokumen - Pub Machine Learning in Bioinformatics of Protein Sequences Algorithms Databases and Resources For Modern Protein Bioinformatics 9811258570 9789811258572
No ratings yet
Dokumen - Pub Machine Learning in Bioinformatics of Protein Sequences Algorithms Databases and Resources For Modern Protein Bioinformatics 9811258570 9789811258572
378 pages
Application of Deep Learning Technique in Next Generation Sequence Experiments
No ratings yet
Application of Deep Learning Technique in Next Generation Sequence Experiments
21 pages
Pi Is 1097276523004665
No ratings yet
Pi Is 1097276523004665
29 pages
Transfer Learning To Leverage Larger Datasets For Improved Prediction of Protein Stability Changes
No ratings yet
Transfer Learning To Leverage Larger Datasets For Improved Prediction of Protein Stability Changes
10 pages
bbw114 PDF
No ratings yet
bbw114 PDF
17 pages
GNE3702 Studyguide
No ratings yet
GNE3702 Studyguide
123 pages
Exploring Personal Genomics First Edition Joel T. Dudley PDF Download
100% (21)
Exploring Personal Genomics First Edition Joel T. Dudley PDF Download
82 pages
22BI13143 Phạm Nhật Hà Assignment2
No ratings yet
22BI13143 Phạm Nhật Hà Assignment2
5 pages
Gene Expression 1 (Transcription)
No ratings yet
Gene Expression 1 (Transcription)
35 pages
Human TMEFF1 Is A Restriction Factor For Herpes Simplex Virus in The Brain
No ratings yet
Human TMEFF1 Is A Restriction Factor For Herpes Simplex Virus in The Brain
37 pages
Gene Identification Methods
No ratings yet
Gene Identification Methods
37 pages
Sex Determination and Dosage Compensation in Drosophila
No ratings yet
Sex Determination and Dosage Compensation in Drosophila
20 pages
Our Genome Unveiled: News and Views
No ratings yet
Our Genome Unveiled: News and Views
3 pages
Nucleic Acids: Structure and Function
No ratings yet
Nucleic Acids: Structure and Function
7 pages
10.4: Eukaryotic Gene Regulation
No ratings yet
10.4: Eukaryotic Gene Regulation
7 pages
Genetics, Lecture 11 (Lecture Notes)
100% (1)
Genetics, Lecture 11 (Lecture Notes)
16 pages
Csir Net Life Science IFAS - Solved Paper
No ratings yet
Csir Net Life Science IFAS - Solved Paper
26 pages
Seqdump
No ratings yet
Seqdump
13 pages
Ambry General Variant Classification Scheme - 2022
No ratings yet
Ambry General Variant Classification Scheme - 2022
1 page
Module 33 Active Reading Guide - Student Version PDF
No ratings yet
Module 33 Active Reading Guide - Student Version PDF
7 pages
Introduction To Molecular Introduction To Molecular Biology Biology
No ratings yet
Introduction To Molecular Introduction To Molecular Biology Biology
18 pages
Brooker CH12 Modified All
No ratings yet
Brooker CH12 Modified All
89 pages
Biomolecules: Composition and Function of Telomerase-A Polymerase Associated With The Origin of Eukaryotes
No ratings yet
Biomolecules: Composition and Function of Telomerase-A Polymerase Associated With The Origin of Eukaryotes
24 pages
MCB 104 Quiz 2 Answer Key
No ratings yet
MCB 104 Quiz 2 Answer Key
6 pages
Eukaryotic Gene Regulation Overview
No ratings yet
Eukaryotic Gene Regulation Overview
89 pages
MCB Final Notes 2
No ratings yet
MCB Final Notes 2
112 pages
Parasitology - M. Shah (Intech, 2012) WW
No ratings yet
Parasitology - M. Shah (Intech, 2012) WW
216 pages
Tejadalapuerta2025causalmachinelearnin - Causal Machine Learning For Single Cell Genomics
No ratings yet
Tejadalapuerta2025causalmachinelearnin - Causal Machine Learning For Single Cell Genomics
12 pages
MMC 2
No ratings yet
MMC 2
18 pages
Interspecies Differences in Drug Delivery
No ratings yet
Interspecies Differences in Drug Delivery
18 pages
Biology Paper 1 Multiple Choice Exam Guide
No ratings yet
Biology Paper 1 Multiple Choice Exam Guide
30 pages
Molecular Biologists Guide To Proteomics
No ratings yet
Molecular Biologists Guide To Proteomics
26 pages
Understanding RNA Splicing Mechanisms
No ratings yet
Understanding RNA Splicing Mechanisms
5 pages
RNA Splicing Mechanisms Explained
No ratings yet
RNA Splicing Mechanisms Explained
41 pages

DeepBind: Genomics Data Analysis

Uploaded by

DeepBind: Genomics Data Analysis

Uploaded by

DeepBind

6.874 - Pranam Chatterjee

Alipanahi, et al., Nature Biotechnology, 2015.

Alipanahi, et al., Nature Biotechnology, 2015.

(This took 4+ years, btw)

Alipanahi, et al., Nature Biotechnology, 2015.

Example Competing Algorithms (26 in total)

● FeatureREDUCE (biophysical PWM/k-mer) None of these are deep-learning-based!

Alipanahi, et al., Nature Biotechnology, 2015.

Rhee, et al., Nucleic Acids Research 2008,

Alipanahi, et al., Nature Biotechnology, 2015.

Alipanahi, et al., Nature Biotechnology, 2015.

Alipanahi, et al., Nature Biotechnology, 2015.

Alipanahi, et al., Nature Biotechnology, 2015.

Alipanahi, et al., Nature Biotechnology, 2015.

CHECK IT OUT YOURSELF: [Link]

Cool name bro

You might also like