0% found this document useful (0 votes)

58 views46 pages

Statistical Methods in Data Science

P ill positive = = 0. 015/0. 0015 = 0. 1 = 10% P(positive) In 3 sentences: This document provides an overview of Bayesian statistics and graphical models. It summarizes key concepts in probability and Bayesian inference such as Bayes' rule, prior and posterior probabilities, and applying Bayes' rule to calculate conditional probabilities. An example demonstrates how to use Bayes' rule to calculate the probability of having a disease given a positive test result.

Uploaded by

Hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views46 pages

Statistical Methods in Data Science

Uploaded by

Hassan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

DAT405 – Part 2:

Statistical methods in
Data Science and AI
DAT405/DIT407, LP4 2022-2023, Module 4
Least squares Maximum likelihood
Tendency/dispersion Histograms/boxplots etc

Bayesian Estimation EM algorithm

Descriptive statistics
Confidence intervals MCMC

Jackknife Data Parametric

Bootstrap Data analysis Non-parametric

exploration
Statistical methods in Hypothesis testing
Sampling
data science and AI
Bayesian
LDA
MCMC Classification
Clustering
HMMs

Principal component Regression Bayesian Neural Markov networks

analsysis (PCA) networks networks
Mixture models Gaussian
Linear Ridge SVMs processes
Factor analysis
Logistic kNN

Chalmers
Syllabus – part 2
• Module 4: Bayesian statistics and graphical models
• Lecture 7: Bayesian statistics
• Lecture 8: Graphical models
• Module 5: Markov models, kernel methods and
decision trees
• Lecture 9: Markov models, reinforcement learning
• Lecture 10: Kernel methods and decision trees

Chalmers 3
Module 4: Bayesian
statistics
Probability theory and statistics
- a quick refresher

Chalmers 5
Probability versus statistics
Probability – predict the likelihood of a future event

Probability
F B Given model, predict data

Statistics – estimate the frequency of a past event

Statistics
? Given data, predict model

Chalmers 6
Sample space, events and random experiments
• A random experiment is a process
that produces random outcomes.
• The sample space is the set of all
possible outcomes in an experiment.
• An event is the outcome, or a subset
of possible outcomes, of an
experiment.

Chalmers 7
Example: roll a die
• Sample space: 𝑺 = 𝟏, 𝟐, … , 𝟔 = 𝟔 outcomes
• Events:
• "At least 3" = {𝟑, 𝟒, 𝟓, 𝟔}
• "Six" = {𝟔}
• "Odd" = {𝟏, 𝟑, 𝟓}
• Probabilities
𝑷 at least 3 = 𝟒/𝟔
𝑷 six = 𝟏/𝟔
𝑷 odd = 𝟑/𝟔

Chalmers 8
Venn diagrams of set operations

Union: 𝐴 ∪ 𝐵 Intersection: 𝐴 ∩ 𝐵 Mutually exclusive: 𝐴 ∩ 𝐵 = 𝜙

S S S

Chalmers 9
Combining events
• Now assume we want to combine two events
• A= ”at least 3” B = ”odd number”
• Union
A ⋃ B = {3,4,5,6} ⋃ {1,3,5} = {1,3,4,5,6}
𝑷(A ⋃ B) = 5/6
• Intersection
A ∩ B = {3,4,5,6} ∩ {1,3,5} = {3,5}
𝑷(A ∩ B) = 2/6

Chalmers 10
Conditional probability 𝐴 𝐵

• The conditional probability of an event 𝑨

given the knowledge that event 𝑩 occurred
𝑷 𝑨∩𝑩 𝑷(𝑨, 𝑩)
𝑷 𝑨𝑩 = =
𝑷(𝑩) 𝑷(𝑩)
• Note also
𝑷 𝑨, 𝑩 = 𝑷 𝑨 𝑩 𝑷 𝑩 = 𝑷 𝑩 𝑨 𝑷(𝑨)

Chalmers 11
Thomas Bayes
(1701 – 1761)

• Developed the idea of using

probability to represent
uncertainty about beliefs
• Most importantly: gave a
method on updating beliefs
given new evidence

Chalmers 12
Chalmers 13
Bayes’ rule 𝐴 𝐵

• Bayes’ rule
𝑷 𝑩 𝑨 𝑷(𝑨)
𝑷 𝑨𝑩 =
𝑷(𝑩)
Proof:
• 𝑨 ∩ 𝑩 = 𝑩 ∩ 𝑨 ⇒ 𝑷 𝑨 ∩ 𝑩 = 𝑷(𝑩 ∩ 𝑨)
• 𝑷 𝑨∩𝑩 =𝑷 𝑨 𝑩 𝑷 𝑩 𝑃(𝐴|𝐵)

• 𝑷 𝑩∩𝑨 =𝑷 𝑩 𝑨 𝑷 𝑨
⇒ 𝑷 𝑨 𝑩 𝑷 𝑩 = 𝑷 𝑩 𝑨 𝑷(𝑨), then divide by 𝑷(𝑩)

Chalmers 14
Bayes’ rule interpretation
likelihood prior
posterior

𝑷 𝑩 𝑨 𝑷(𝑨)
𝑷 𝑨𝑩 =
𝑷(𝑩)

normalizer

We have prior information 𝑷(𝑨) of event 𝑨, and then update the posterior
probability 𝑷(𝑨|𝑩) as more information/data 𝑩 is achieved.

Chalmers 16
Bayes’ rule interpretation

Prior:
Before making observation you think the probability of your hypothesis is

Posterior:
After making observation you think the probability of your hypothesis is

Chalmers 17
Example: spam or ham?
• Suppose I get an email with the word ”invest”. Is it more likely to
spam or ham?
• Hard to estimate P(spam|”invest”). Bayes’ rule will help!
• Estimated proportions of emails sent to me that are spam or ham:
• P(spam) = 0.4
• P(ham) = 0.6
• Proportions of emails containing the word ”invest”
• P(”invest”|spam) = 0.05
Easier to
• P(”invest”|ham) = 0.01
estimate!
Chalmers 18
Example: (cont)
P "invest" spam P(spam) 0.05 ⋅ 0.4 0.02
P spam|"invest" = = =
P("invest") P("invest") P("invest")

P "invest" ham P(ham) 0.01 ⋅ 0.6 0.006

P ham|"invest" = = =
P("invest") P("invest") P("invest")

⇒ P spam|"invest" > P ham|"invest"

Didn’t have to
estimate P(“invest”)!

Chalmers 19
Mutually exclusive and exhaustive events
Also called
Events 𝑬𝟏 , 𝑬𝟐 , … , 𝑬𝒏 are pairwise disjoint.
• mutually exclusive if they cannot occur
simultaneously
𝑬𝒊 ∩ 𝑬𝒋 = 𝝓, 𝒊 ≠ 𝒋 Sample space 𝑆

• exhaustive if they cover the sample space 𝐸"

𝐸#
𝒏 𝐸!
𝑬𝟏 ∪ 𝑬𝟐 ∪ ⋯ ∪ 𝑬𝒏 = L 𝑬𝒊 = 𝑺
𝒊&𝟏
Also called a
partition. 𝐸$

Chalmers 20
Total law of probability
• For mutually exclusive and exhaustive events
𝑬𝟏 , 𝑬𝟐 , … , 𝑬𝒏 we get for any other event 𝑩
𝒏
𝑷 𝑩 = M 𝑷 𝑩 𝑬𝒊
𝒊&𝟏

Example: P(apple) =
P(green apple) + P(red apple) + P(other apples)

Chalmers 21
Bayes’ rule – extended 𝐴 𝐵

• Bayes’ rule
𝑷 𝑩 𝑨 𝑷(𝑨)
𝑷 𝑨𝑩 =
𝑷(𝑩)
• For mutually exclusive and exhaustive events
𝑬𝟏 , 𝑬𝟐 , … , 𝑬𝒏 we get
𝑷 𝑩 𝑨 𝑷(𝑨) 𝑷 𝑩 𝑨 𝑷(𝑨)
𝑷 𝑨𝑩 = = 𝒏
𝑷(𝑩) ∑𝒊&𝟏 𝑷(𝑩|𝑬𝒊 )

𝑃(𝐴|𝐵)

Chalmers 23
Example: applying Bayes’ rule
• Assume that 15 out of 10,000 individuals in a population
have a certain disease 𝑫.
• The test is not perfect: when testing for the disease
• an ill person always tests positive
• a healthy person tests positive with probability 0.0002
• Given that you tested positive, what is the probability that
you have the disease?

Chalmers 24
Example (cont.)
𝑷 positive ill 𝑷(ill)
Bayes’ rule: 𝑷 ill positive =
𝑷(positive)
• We have
• 𝑷 ill = 𝟎. 𝟎𝟎𝟏𝟓 and 𝑷 healthy = 𝟏 − 𝟎. 𝟎𝟎𝟏𝟓 = 𝟎. 𝟗𝟗𝟖𝟓
• 𝑷 positive ill = 𝟏, 𝑷 positive healthy = 𝟎. 𝟎𝟎𝟐
• 𝑷 positive = 𝑷 positive ill 𝑷 ill + 𝑷 positive healthy 𝑷 healthy
Would you call
Hence this test good?
𝑷 positive ill 𝑷(ill) 1 ⋅ 0.0015
𝑷 ill positive = = = 𝟎. 𝟒𝟑
𝑷(positive) 1 ⋅ 0.0015 + 0.002 ⋅ 0.9985

Chalmers 25
Is Steve a librarian or a farmer?

Two options:
• Steve is a librarian, or
• Steve is a farmer

What do you think?

Description of Steve

Chalmers 27
Study by Kahneman and Tversky

Chalmers 28
What people in the study answered

Chalmers 29
Background fact
There were about 20 times as
many farmers than librarians
in the US at that time.

Did you consider the librarian

vs farmer ratio?

Most people in the study

didn’t.

Chalmers 30
Background fact There were about 20 times as
many farmers than librarians
in Sweden in 2017.

Librarians

Crop farmers

Animal farmers

Mixed farmers

Data source
Chalmers 31
Is Steve a librarian or a farmer?
• Let
• D = description (of Steve)
• L = librarian
• F = farmer
• We would like to know
P(L|D) = ?

Chalmers 32
Is Steve a librarian or a farmer?
• Bayes’ rule
P L P(D|L)
P LD =
P(D)

Chalmers 33
Without using
Estimate the prior P(L) information/data D

Visualize the entire

Librarians population

Farmers

P L ≈ 1/21 ≈ 5%

Chalmers 34
Our data
Add the info D

Mark the individuals

that match the
description D

Chalmers 35
Estimate the likelihoods P(D|L) and P(D|F)

Estimates:
P D L = 40%
P D F = 10%

Chalmers 36
Use Bayes’ rule to compute the posteriors

P L D = 4/(4 + 20) ≈ 17%

P F D = 20/ 4 + 20 ≈ 83%

So it is 4 times more likely that

Steve is a farmer.

Chalmers 37
What we did

Proportion of L and F Restricting to those satisfying D Proportion of L satisfying D

Chalmers 38
General case in one slide

Chalmers 43
Random variables and
probability distributions

Chalmers 44
Random variables and probability distributions
• A random variable is a function of the
𝑿: 𝑺 → ℝ
outcomes in a random experiment.
• Assumes values according to a 𝑷 𝒂≤𝑿≤𝒃 =?
probability distribution.
• Discrete r.v.: finite or countable number of 𝑷 𝑿=𝒂 >𝟎
values,
• Continuous r.v: takes all real values in 𝑷 𝑿=𝒂 =𝟎
given intervals 𝒃
𝑷 𝒂 ≤ 𝑿 ≤ 𝒃 = Z 𝒇 𝒙 𝒅𝒙
𝒂

Chalmers 45
Probability distributions
• Typically depend on one or more parameters
• Common discrete distributions
• Uniform: 𝑼(𝒂, 𝒃)
• Binomial: 𝑩𝒊𝒏(𝒏, 𝒑)
• Geometric: 𝑮𝒆𝒐(𝒑)
• Hypergeometric: 𝑯𝑮𝒆𝒐(𝑵, 𝑲, 𝒏)
• Poisson: 𝑷𝒐𝒊(𝝀)
• Negative binomial: 𝑵𝑩(𝒓, 𝒑)

Chalmers 46
Probability distributions
• Common continuous distributions
• Uniform: 𝑼[𝒂, 𝒃]
• Normal (Gaussian): 𝑵(𝝁, 𝝈𝟐 )
• Student’s t: 𝒕𝒏#𝟏
• Exponential: 𝑬𝒙𝒑(𝝀)
• Chi-square: 𝝌𝟐𝒏#𝟏
• Beta: 𝑩𝒆𝒕𝒂(𝜶, 𝜷)

Chalmers 47
Expected Value and Variance
Two key characteristics of a random variable

• Expected value:
• Mean value of random variable
• Variance:
• Measure of how far, on average, the random variable is from its
mean.

Chalmers 48
Expected Value and Variance
• For a discrete random variable 𝑿 the expected value is the
weighted average of the possible outcomes
𝝁 = 𝔼[𝑿] = m 𝑥% ∗ 𝑃(𝑋 = 𝑥% )
%
• Intuitively measures the value you can expect to get on
average in some random experiment
• E.g. Rolling a fair die once: 1/6 + 2/6 + 3/6 + 4/6 + 5/6 + 6/6 = 2 ½ =
3.5.

Chalmers 49
Expected Value and Variance
• For a discrete random variable 𝑿 the variance is the
weighted average of the square distance to the mean
𝑉𝑎𝑟[𝑿] = m(𝑥% − 𝜇)& ∗ 𝑃(𝑋 = 𝑥% )
%
• Intuitively measures how spread out the values of the random
variable are.

Chalmers 50
Statistical inference
Estimation and analysis of these parameters
in random samples to draw conclusions of
the underlying population.

Two main paradigms:

• Frequentism
• Bayesianism

Chalmers 51
Classical or frequentist probability theory:
• Probabilities are relative frequencies of the event in
a large number of trials.

Bayesian probability theory:

• Probabilities are reasonable expectation of an event,
quantifying personal beliefs and prior information, and
including the degree of certainty in those beliefs.

Chalmers 52
Frequentism versus Bayesianism
Frequentism Bayesianism
+ Objective + More natural
+ Trade of between errors + Logically rigourous
+ Design controls bias + Can explore different priors
+ Long prosperous history + Data can be added
- p-value depends on design - Prior is subjective
- Ad-hoc notions of ”data more - Assigning probabilities to
extreme” hypotheses
- Fully specified designs ahead

Chalmers 61

IDS21 Bayes Theorem
No ratings yet
IDS21 Bayes Theorem
22 pages
Probability Basics for Beginners
No ratings yet
Probability Basics for Beginners
28 pages
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
No ratings yet
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
27 pages
Bayesian Analysis & Confidence Intervals
No ratings yet
Bayesian Analysis & Confidence Intervals
17 pages
Introduction To Probability Theory: Rong Jin
No ratings yet
Introduction To Probability Theory: Rong Jin
39 pages
Bayesian Statistics: Thomas Bayes
No ratings yet
Bayesian Statistics: Thomas Bayes
22 pages
Probability Basics for College Students
No ratings yet
Probability Basics for College Students
37 pages
Unit I Probability
No ratings yet
Unit I Probability
40 pages
Module 5 - Inferential Statistics and Their Application
No ratings yet
Module 5 - Inferential Statistics and Their Application
43 pages
Bayesian Learning for Graphics
No ratings yet
Bayesian Learning for Graphics
141 pages
4 TH
No ratings yet
4 TH
51 pages
Bayes Network
100% (1)
Bayes Network
80 pages
Lecture 5 Bayesian Model 1
No ratings yet
Lecture 5 Bayesian Model 1
61 pages
Bayesian Modeling Guide
No ratings yet
Bayesian Modeling Guide
13 pages
Introduction To Probability
100% (1)
Introduction To Probability
17 pages
Advance Statistics
No ratings yet
Advance Statistics
23 pages
Probabilistic Model
No ratings yet
Probabilistic Model
7 pages
From the Foundations of Probability to its Applications - 學術
No ratings yet
From the Foundations of Probability to its Applications - 學術
5 pages
Part 1
No ratings yet
Part 1
200 pages
Bayesian Data Analysis
No ratings yet
Bayesian Data Analysis
36 pages
Unit-Iii Knowledge & Reasoning
No ratings yet
Unit-Iii Knowledge & Reasoning
35 pages
Introduction to Bayesian Probability
No ratings yet
Introduction to Bayesian Probability
41 pages
Introduction To Discrete Bayesian Methods: Petri Nokelainen
No ratings yet
Introduction To Discrete Bayesian Methods: Petri Nokelainen
146 pages
Chapter 3
No ratings yet
Chapter 3
132 pages
Bayes Reasoning
No ratings yet
Bayes Reasoning
45 pages
Unit 3
No ratings yet
Unit 3
98 pages
Bayesian Inference: Chris Mathys
No ratings yet
Bayesian Inference: Chris Mathys
32 pages
Proba
No ratings yet
Proba
188 pages
Unit 3 Introduction To Probability
No ratings yet
Unit 3 Introduction To Probability
103 pages
Introduction to Discrete Bayesian Methods
No ratings yet
Introduction to Discrete Bayesian Methods
146 pages
BML Lecture Notes
No ratings yet
BML Lecture Notes
126 pages
BaYesian Models Machine Learning 2016
No ratings yet
BaYesian Models Machine Learning 2016
126 pages
Understanding Probability Basics
No ratings yet
Understanding Probability Basics
28 pages
Bayesian Statistics Primer PDF
No ratings yet
Bayesian Statistics Primer PDF
23 pages
Probability Concepts and Bayesian Analysis
No ratings yet
Probability Concepts and Bayesian Analysis
57 pages
Rvrlecture 1
No ratings yet
Rvrlecture 1
20 pages
07 Probability Review
No ratings yet
07 Probability Review
56 pages
Sam Roweis Probx
No ratings yet
Sam Roweis Probx
12 pages
BST413 12jan Page1to11
No ratings yet
BST413 12jan Page1to11
11 pages
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
No ratings yet
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
34 pages
Unit 2 (Part - 2)
No ratings yet
Unit 2 (Part - 2)
21 pages
Understanding Bayesian Probability Concepts
No ratings yet
Understanding Bayesian Probability Concepts
39 pages
Introduction to Probability Concepts
No ratings yet
Introduction to Probability Concepts
38 pages
Advanced Business Statistics For Decision Making: Facilitator-Dr. Shilpa Bhaskar Mujumdar
100% (1)
Advanced Business Statistics For Decision Making: Facilitator-Dr. Shilpa Bhaskar Mujumdar
31 pages
Statistical Data Analysis: PH4515: 1 Course Structure
No ratings yet
Statistical Data Analysis: PH4515: 1 Course Structure
5 pages
Chapter 4
No ratings yet
Chapter 4
3 pages
Probability Basics and Bayes' Theorem
No ratings yet
Probability Basics and Bayes' Theorem
88 pages
Math Foundations
No ratings yet
Math Foundations
48 pages
Bayesian Statistics
No ratings yet
Bayesian Statistics
76 pages
L2 - Mathematical Preliminaries
No ratings yet
L2 - Mathematical Preliminaries
41 pages
Economics Statistics Primer
No ratings yet
Economics Statistics Primer
17 pages
DAP Unit 2 Notes
No ratings yet
DAP Unit 2 Notes
57 pages
On Probability Theory &stochastic Process
No ratings yet
On Probability Theory &stochastic Process
101 pages
Decisionmaking and Proplem Solving in Health Care Sector
No ratings yet
Decisionmaking and Proplem Solving in Health Care Sector
19 pages
Week1 Watermark
No ratings yet
Week1 Watermark
92 pages
JavaScript RegExp Patterns Guide
No ratings yet
JavaScript RegExp Patterns Guide
13 pages
Summer Vacation Homework 2022 Class 10
No ratings yet
Summer Vacation Homework 2022 Class 10
3 pages
Catalog Zelio Logic Smart Relays - English - September 2018
No ratings yet
Catalog Zelio Logic Smart Relays - English - September 2018
42 pages
SAP Engineering Change Management
No ratings yet
SAP Engineering Change Management
87 pages
Grade 11 Probability Distribution Guide
No ratings yet
Grade 11 Probability Distribution Guide
4 pages
5-Project Engineers Detailed Estimate Template
No ratings yet
5-Project Engineers Detailed Estimate Template
1 page
Bci NT-3
No ratings yet
Bci NT-3
96 pages
Information Booklet - Biotechnology - ACY YEAR - 2023-24
No ratings yet
Information Booklet - Biotechnology - ACY YEAR - 2023-24
64 pages
Henry Fingerprint Classification System
No ratings yet
Henry Fingerprint Classification System
12 pages
YCMOU MBA HR Points
No ratings yet
YCMOU MBA HR Points
8 pages
Arduino Serial Flush Guide
No ratings yet
Arduino Serial Flush Guide
3 pages
Suburbia vs. City Life
No ratings yet
Suburbia vs. City Life
6 pages
Advanced Construction Tech Module
No ratings yet
Advanced Construction Tech Module
3 pages
In-Game Weapon Performance Analysis
No ratings yet
In-Game Weapon Performance Analysis
23 pages
Trestle Foundation Design Calculations
No ratings yet
Trestle Foundation Design Calculations
7 pages
Reward System Process Issues
100% (1)
Reward System Process Issues
16 pages
A Short Guide To Crack Competitive Exam
No ratings yet
A Short Guide To Crack Competitive Exam
2 pages
Letter of Motivation - OVGU
No ratings yet
Letter of Motivation - OVGU
1 page
FS2 Learing Ep. 1 The Teacher We Remember
No ratings yet
FS2 Learing Ep. 1 The Teacher We Remember
39 pages
MAK Main Generator 8M25 Service Information - PDF - Turbocharger - Turbine
No ratings yet
MAK Main Generator 8M25 Service Information - PDF - Turbocharger - Turbine
36 pages
Big Bang Theory
No ratings yet
Big Bang Theory
9 pages
Waxing and Waning Aspects in Astrology
No ratings yet
Waxing and Waning Aspects in Astrology
5 pages
Nonprofit Admin & Finance Resume
No ratings yet
Nonprofit Admin & Finance Resume
1 page
8º Ano - Test Unit 4
No ratings yet
8º Ano - Test Unit 4
6 pages
Highway Horizontal Alignment Guide
No ratings yet
Highway Horizontal Alignment Guide
76 pages
Rotary 4axes PDF
No ratings yet
Rotary 4axes PDF
60 pages
GR-5 - P-2-Portions
No ratings yet
GR-5 - P-2-Portions
3 pages
Forming Characters in Architectural Design
No ratings yet
Forming Characters in Architectural Design
81 pages

Statistical Methods in Data Science

Uploaded by

Statistical Methods in Data Science

Uploaded by

DAT405 – Part 2:

Bayesian Estimation EM algorithm

Jackknife Data Parametric

Bootstrap Data analysis Non-parametric

Principal component Regression Bayesian Neural Markov networks

Statistics – estimate the frequency of a past event

Union: 𝐴 ∪ 𝐵 Intersection: 𝐴 ∩ 𝐵 Mutually exclusive: 𝐴 ∩ 𝐵 = 𝜙

• The conditional probability of an event 𝑨

• Developed the idea of using

P "invest" ham P(ham) 0.01 ⋅ 0.6 0.006

⇒ P spam|"invest" > P ham|"invest"

• exhaustive if they cover the sample space 𝐸"

What do you think?

Did you consider the librarian

Most people in the study

Visualize the entire

Mark the individuals

P L D = 4/(4 + 20) ≈ 17%

So it is 4 times more likely that

Proportion of L and F Restricting to those satisfying D Proportion of L satisfying D

Two main paradigms:

Bayesian probability theory:

You might also like