0% found this document useful (0 votes)

66 views34 pages

Data Science Course Overview

Biological Data Science

Uploaded by

zeliawillscumberg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views34 pages

Data Science Course Overview

Biological Data Science

Uploaded by

zeliawillscumberg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Dr Athanasios Tsanas (‘Thanasis’)

Associate Prof. in Data Science

Usher Institute, Medical School
University of Edinburgh
[Link]. Maths, Edinburgh
[Link]. + [Link]. [Link]. Signal
Post-doc: Eng. Usher Institute
Engineering Processing
Lecturer: SBS Medical School

2001-2007 2007-2008 2008 - 2019 2017 - present

Tidal Generation: Rolls-Royce EMEC 500kW © A. Tsanas, 2020
 Lecture notes
 Papers/reports at the ‘Reading material’ of each
lecture: this is part of the exam material

 “An introduction to statistical learning”

by G. James, D. Witten, T. Hastie, R.
Tibshirani. It is freely available as a pdf
from the authors’ website: [Link]
[Link]/~gareth/ISL/ISLR%20First%20
[Link].
© A. Tsanas, 2020
 10-credit course (total hours = 100)

 Coursework: 50% + Exam: 50%

Lectures
• Thanasis Tsanas
• 10 hours of lectures

R Labs
• Sjoerd Beentjes
• 10 hours R labs
© A. Tsanas, 2020
 Characterising raw data (data mining)
 Probability distributions
 Statistical associations
 Statistical mapping (learning models)
 Model validation and assessment
 R programming (labs with Stuart)

© A. Tsanas, 2020
 Understand setting and complications related
to using and analysing biomedical data
 Understand first-principle and data-driven
models differences
 Data mining and feature extraction
 Statistical learning models & validation
 Ηigh-dimensional data implications
 Write well-written and modular R code
© A. Tsanas, 2020
Day 1 • Introduction and overview; reminder of basic concepts
Day 2 • Data collection and sampling

Day 3 • Data mining: signal/image processing and information extraction

Day 4 • Data visualization: density estimation, statistical descriptors

Day 5 • Exploratory analysis: hypothesis testing and quantifying relationships

Day 6 • Feature selection and feature transformation

Day 7 • Statistical machine learning and model validation

Day 8 • Statistical machine learning and model validation

Day 9 • Practical examples: bringing things together

Day 10 • Revision and exam preparation

© A. Tsanas, 2020
Data
Exploratory Feature Statistical
visualization
analysis selection or mapping
(density
(statistical transformation (regression/clas
estimation,
associations) (e.g. PCA) sification)
scatter plots)

© A. Tsanas, 2020
Day 1 part 2
Talk in the language of the
clinicians
• Understand what they need and the
terms the experts in the domain use

Understand the physiology

• Domain dependent
• You will have to read biology/physiology
books and articles
© A. Tsanas, 2020
Monitor Parkinson’s disease using voice
• Before looking into the data, talk with the domain experts
• Understand the underlying physiology
Understanding heart physiology and circulatory system
Differential equations Statistics

First principle models Data driven models

• Mechanistic insight ☺ • Less interpretable 
• Difficult to match data  • Better predictions ☺
© A. Tsanas, 2020
Day 1 part 3
• Usefulness
– Often occurs, e.g. heights, IQ, returns, errors
– Can be used to approximate other distributions
– Central Limit Theorem - distribution of averages

• Structure 𝑿~𝓝(𝝁,
X ~ N(𝝈
𝟐
))
– Continuous, bell shape
– Two parameters,  and 

1 𝑥−𝜇 2
– Analytic formula 𝑝 𝑥 = exp −
2𝜋𝜎 2 2𝜎 2
© A. Tsanas, 2020
 Mean μ = 0
𝑿~𝓝(𝟎, 𝟐
Z~N(0,1) 𝟏 )
 Standard deviation σ = 1

 Tabulation
- a necessity 
- given for Z  0
- tables not all same (area in tail, area from mean)

Examples
P(Z >1.96) = 0.025
P(-1.96 < Z < 1.96) = 0.95  P(-2<Z<2)  0.95
P(-1<Z<1) = 0.68
© A. Tsanas, 2020
 Each value is area beyond
point which is Z standard
deviations from mean.
 Z-value of X is number of
standard deviations of X
from mean:

X −
Z=


© A. Tsanas, 2020
 The time required for a certain drug to have an effect is normally
distributed with mean of 30 minutes and [Link]. of 9 minutes.
What is the probability that the drug takes more than 42
minutes to have an effect on a random patient?

30 42

 P(time  42) = P(Z  (42-30)/9) = P(Z1.33) = 0.0918

 95% of area lies

within 2σ of μ.

 99.7% of area
1 1 lies within 3σ
of μ.
2 2

3 3

 You could go back to the Z-table and look for the

corresponding value with a very low probability, but
it is not that detailed

 Practically, the max and min values are about 5σ

around μ with about 1,000,000 points (there are
detailed analytical formulas for computations)

 Draw random data in R to verify concept

 Large supporting community

 Free of charge

 Widely used in academia + industry

 Install R studio

 R packages

Convention to attract your attention!

© A. Tsanas, 2020
 Set values to variables: a=5; b = 3;
 Simple arithmetic: c = a+b
 Write something and provide comments: the
“#” operator

“<-” is more typically used instead of “=“

 for(condition){smth happens}

 Example: repeat something 100 times:

for (i in 1:1000){smth happens as a function of i,
e.g. some sort of checking entries in a vector}

 if(condition){smth happens}

 if(condition) {smth happens} else {smth happens}

 Example: check if the variable a is null

if([Link](a))

 switch(conditioned value,
condition1_happens=output1,
condition2_happens=output2)
 If conditioned value is not present, switch returns
‘NULL’

 Example: switch amongst multiple countries

a <- switch(country[i], "United-States"=1,
"Equador"=2) #you can populate this with countries
© A. Tsanas, 2020
© A. Tsanas, 2020
 No specific text

 Refresh your understanding of the normal

distribution

 Refresh your understanding on probabilities

 Use my tutorial document to download and

MVB Estimators in Statistical Inference
No ratings yet
MVB Estimators in Statistical Inference
5 pages
Estadistica Medica Con R
No ratings yet
Estadistica Medica Con R
73 pages
Biological Data Science Lecture4
No ratings yet
Biological Data Science Lecture4
21 pages
Basic Statistics With R: Reaching Decisions With Data Stephen C. Loftus Full Chapters Included
No ratings yet
Basic Statistics With R: Reaching Decisions With Data Stephen C. Loftus Full Chapters Included
113 pages
Kassambara, Alboukadel - Machine Learning Essentials - Practical Guide in R (2018)
100% (1)
Kassambara, Alboukadel - Machine Learning Essentials - Practical Guide in R (2018)
424 pages
Medical Statistics With R
No ratings yet
Medical Statistics With R
85 pages
Azerbaijan State Oil and Industry University Ba Programs/ Zu
No ratings yet
Azerbaijan State Oil and Industry University Ba Programs/ Zu
4 pages
Advanced Analytics
No ratings yet
Advanced Analytics
23 pages
Intro To Probability and Statistics
No ratings yet
Intro To Probability and Statistics
147 pages
Undefined
No ratings yet
Undefined
3 pages
MAT 1011 Applied Statistics Revision - New
No ratings yet
MAT 1011 Applied Statistics Revision - New
3 pages
Master Statistics for Analyst Interviews
No ratings yet
Master Statistics for Analyst Interviews
47 pages
Biological Data Science Lecture7
No ratings yet
Biological Data Science Lecture7
17 pages
DLMDSAS01 - Advanced Statistics.
100% (1)
DLMDSAS01 - Advanced Statistics.
248 pages
Master Class Data Uses 100712
No ratings yet
Master Class Data Uses 100712
69 pages
Introduction to R Programming Basics
No ratings yet
Introduction to R Programming Basics
40 pages
STA1007 Notes
No ratings yet
STA1007 Notes
251 pages
Basic Statistics With R - Reaching Decisions With Data
No ratings yet
Basic Statistics With R - Reaching Decisions With Data
262 pages
A Practical Guide To Statistics - Book
No ratings yet
A Practical Guide To Statistics - Book
160 pages
Business Statistics Course Guide
No ratings yet
Business Statistics Course Guide
69 pages
Introduction To Data Science Exploratory Data Analysis
No ratings yet
Introduction To Data Science Exploratory Data Analysis
55 pages
CS194 Lec 06 EDA
No ratings yet
CS194 Lec 06 EDA
40 pages
Book IntroStatistics PDF
No ratings yet
Book IntroStatistics PDF
263 pages
Lecture 1 Introduction Lecture 2-9-2024
No ratings yet
Lecture 1 Introduction Lecture 2-9-2024
63 pages
Biological Data Science Lecture5
No ratings yet
Biological Data Science Lecture5
22 pages
Data Modelling Visualization Solutions Marking Scheme
No ratings yet
Data Modelling Visualization Solutions Marking Scheme
6 pages
Complete Data Analysts RoadMap
No ratings yet
Complete Data Analysts RoadMap
47 pages
Stats10 lecture 1.1 copy - 副本
No ratings yet
Stats10 lecture 1.1 copy - 副本
61 pages
Understanding Advanced Statistical Methods 1st Edition Peter Westfall Instant Download
No ratings yet
Understanding Advanced Statistical Methods 1st Edition Peter Westfall Instant Download
52 pages
Lecture Notes
No ratings yet
Lecture Notes
138 pages
BE184
No ratings yet
BE184
47 pages
Lecture Notes Statistics
100% (2)
Lecture Notes Statistics
117 pages
Omkar
No ratings yet
Omkar
37 pages
Jamovi
100% (3)
Jamovi
519 pages
DEV Lab Manual
No ratings yet
DEV Lab Manual
27 pages
LBOE2112 Module 2 Multivariate Data Analysis - 2024-2025 - All
No ratings yet
LBOE2112 Module 2 Multivariate Data Analysis - 2024-2025 - All
155 pages
Introduction
No ratings yet
Introduction
3 pages
Essential Statistics For Data Science: A Concise Crash Course Mu Zhu Download
No ratings yet
Essential Statistics For Data Science: A Concise Crash Course Mu Zhu Download
167 pages
Econ Stats for SFU Students
100% (1)
Econ Stats for SFU Students
354 pages
Mba Zc417 Course Handout
No ratings yet
Mba Zc417 Course Handout
8 pages
As CP
No ratings yet
As CP
3 pages
CAM625 2019 s1 Module1
No ratings yet
CAM625 2019 s1 Module1
31 pages
Data Analyticsi Foundations
No ratings yet
Data Analyticsi Foundations
540 pages
Statistical Thinking From Scratch: A Primer For Scientists M. D. Edge Instant Download
No ratings yet
Statistical Thinking From Scratch: A Primer For Scientists M. D. Edge Instant Download
147 pages
Statistics - Material
No ratings yet
Statistics - Material
12 pages
CADD
No ratings yet
CADD
16 pages
MATLAB for Psychology: Stats & Fitting
No ratings yet
MATLAB for Psychology: Stats & Fitting
31 pages
Type I and Type II Errors Type I Error
No ratings yet
Type I and Type II Errors Type I Error
7 pages
Award in Education and Training Sample
No ratings yet
Award in Education and Training Sample
9 pages
Doing Business in Hungary
No ratings yet
Doing Business in Hungary
22 pages
W2e Multivariate Gaussian
No ratings yet
W2e Multivariate Gaussian
6 pages
Master of Science in Renewable Energy and Management
No ratings yet
Master of Science in Renewable Energy and Management
1 page
w2c Central Limit
No ratings yet
w2c Central Limit
1 page
MDA3S
No ratings yet
MDA3S
22 pages
BDS 2018-19
No ratings yet
BDS 2018-19
6 pages
Biological Data Science Lecture6
No ratings yet
Biological Data Science Lecture6
29 pages
MATH11183 Week 1-Part 2
No ratings yet
MATH11183 Week 1-Part 2
18 pages
BDS 2016-17
No ratings yet
BDS 2016-17
4 pages
Week 2 Naive Bayes
No ratings yet
Week 2 Naive Bayes
15 pages
State Space Models Overview
No ratings yet
State Space Models Overview
31 pages
Expectations in Probability Theory
No ratings yet
Expectations in Probability Theory
3 pages
Week 8 Pca
No ratings yet
Week 8 Pca
26 pages
Part 4
No ratings yet
Part 4
24 pages
Part 3
No ratings yet
Part 3
29 pages
TS Part2
No ratings yet
TS Part2
62 pages
Slides 03 A
No ratings yet
Slides 03 A
21 pages
PMRslides 02
No ratings yet
PMRslides 02
13 pages
Bayesian Workshop1 Solution
No ratings yet
Bayesian Workshop1 Solution
3 pages
w9b Netflix Prize
No ratings yet
w9b Netflix Prize
3 pages
PMRslides 03 B
No ratings yet
PMRslides 03 B
45 pages
Bayesian Week4 LectureNotes
No ratings yet
Bayesian Week4 LectureNotes
15 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
Bio Statslectures
No ratings yet
Bio Statslectures
60 pages
Heat Advection
No ratings yet
Heat Advection
12 pages
2017 AMAM Exam Paper
No ratings yet
2017 AMAM Exam Paper
6 pages
Laplace Approximation in Bayesian Logistic Regression
No ratings yet
Laplace Approximation in Bayesian Logistic Regression
4 pages
2019 AMAM Exam Paper
No ratings yet
2019 AMAM Exam Paper
3 pages
Unlock Academic Research with Sci-Hub
No ratings yet
Unlock Academic Research with Sci-Hub
9 pages
BSBMGT616 Develop and Implement Strategic Plans: Learner Guide
100% (1)
BSBMGT616 Develop and Implement Strategic Plans: Learner Guide
68 pages
E-Marketing-Chapter 4
No ratings yet
E-Marketing-Chapter 4
14 pages
Bureaucracy in Kenyan Management
No ratings yet
Bureaucracy in Kenyan Management
12 pages
PSQ
No ratings yet
PSQ
1 page
Why Do Teachers Use Corporal Punishment in Schools
No ratings yet
Why Do Teachers Use Corporal Punishment in Schools
80 pages
Food Law Legal Challenges in Detection and Prosecution in Bangladesh
No ratings yet
Food Law Legal Challenges in Detection and Prosecution in Bangladesh
5 pages
Neural Network Models For Inflation Forecasting An
No ratings yet
Neural Network Models For Inflation Forecasting An
8 pages
Feasibility Study Guide for Recreation Facilities
No ratings yet
Feasibility Study Guide for Recreation Facilities
31 pages
NCRP - Sign 6
No ratings yet
NCRP - Sign 6
1 page
Verified PDF Download Testbank White Mens Law The Roots of Systemic Racism Irons Fast Instant Download
No ratings yet
Verified PDF Download Testbank White Mens Law The Roots of Systemic Racism Irons Fast Instant Download
400 pages
Weather Radar System Considerations
No ratings yet
Weather Radar System Considerations
6 pages
Study Abroad Book Final
No ratings yet
Study Abroad Book Final
27 pages
Lesson Plan Stat Ang Proba
No ratings yet
Lesson Plan Stat Ang Proba
6 pages
Performance Evaluation of Water Repellents For Above Grade Masonry
No ratings yet
Performance Evaluation of Water Repellents For Above Grade Masonry
10 pages
Building An Automatic Defect Verification System Using Deep Neural Network For PCB Defect Classification
No ratings yet
Building An Automatic Defect Verification System Using Deep Neural Network For PCB Defect Classification
5 pages
Community Health Diagnosis Guide
100% (1)
Community Health Diagnosis Guide
6 pages
MYP Science Level 2 Unit 1 Summative Assessment
No ratings yet
MYP Science Level 2 Unit 1 Summative Assessment
9 pages
The Impact of Green Intellectual Capital On Green Innovation
No ratings yet
The Impact of Green Intellectual Capital On Green Innovation
14 pages
Perception and Propensities of Senior Hi
No ratings yet
Perception and Propensities of Senior Hi
12 pages
Flipped Classroom Presentation - Itaa 2014
No ratings yet
Flipped Classroom Presentation - Itaa 2014
16 pages
Bcme Reading Material
No ratings yet
Bcme Reading Material
239 pages
Wound Assessment
No ratings yet
Wound Assessment
5 pages
Fitness Measures and Health Outcomes in Youth (2012) : This PDF Is Available at
No ratings yet
Fitness Measures and Health Outcomes in Youth (2012) : This PDF Is Available at
275 pages
Finals Project PPC
No ratings yet
Finals Project PPC
23 pages
A Study To Assess The Effectiveness of Video Assisted Teaching Programme On Knowledge Regarding Family Planning Methods Among Married Women at Selected Community Area, Bangalore
No ratings yet
A Study To Assess The Effectiveness of Video Assisted Teaching Programme On Knowledge Regarding Family Planning Methods Among Married Women at Selected Community Area, Bangalore
15 pages
Come Join Our Team! - 241205 - 190828
No ratings yet
Come Join Our Team! - 241205 - 190828
2 pages
Ranked Vulnerability Risk Assessment
No ratings yet
Ranked Vulnerability Risk Assessment
59 pages
Vocational English Learning Impact
No ratings yet
Vocational English Learning Impact
11 pages
English File Intermediate Teachers Guide Fourth Edition Christina LathamKoenig Study Guide
No ratings yet
English File Intermediate Teachers Guide Fourth Edition Christina LathamKoenig Study Guide
320 pages