Lecture 2 Foundations of Inference

Uploaded by

sstrawberrystrawbs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views23 pages

Lecture 2 Foundations of Inference

Uploaded by

sstrawberrystrawbs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Datasets can be standardized using the mean

and standard deviation

• Each data point minus the mean, divided by the standard deviation.
𝑥 − 𝑚𝑒𝑎𝑛
𝑆𝐷
• Standardization allows for comparison of data with different units and
variances
• “My test 1 score was 12 points higher than the class average (mean 72, SD 8).
My test 2 score was also 12 points higher than the class average (mean 72, SD
4)”
• I scored 1.5 SD higher and 3 SD higher than the class average on tests 1 and 2,
respectively.

1
When data are assumed to be normal, then
standardization becomes even more powerful because
AUC is equal to 1
• You can know the probability of
values more extreme than a
certain criterion
• For instance, the probability of a
value being more than 3
standard deviations above the
mean is only 0.001 (0.1%)
• p(>0sd)=0.500
• p(>1sd)=0.159
• p(>2sd)=0.023
• p(>3sd)=0.001
2
Inferential statistics starts with descriptive
statistics

Population Sample

Mean 𝜇 𝑋ത
Standard
deviation 𝜎 𝑠
3
The cost of using a sample = 1
• Population variance (𝜎 2 ) is just the average of SS (i.e., SS divided by
N). But we can almost never calculate 𝜎 2 , so instead we estimate 𝜎 2
by calculating a sample’s variance (𝑠 2 ).
• Luckily, the equation for 𝑠 is almost identical to 𝜎:
σ𝑖=𝑁
𝑖=1 (𝑥𝑖 − ത
𝑋) 2
𝑠=
𝑛−1
• The denominator represents the degrees of freedom

4
Degrees of freedom
• Describes the number of
scores in a sample that are
independent and free to
vary.

•https://www.youtube.com/watch?v=wsvfasNpU2s
5
Degrees of
Freedom:
Real-World
Scenario

6
Equations for sample statistics:
• Mean
σ𝑖=𝑁
𝑖=1 𝑥𝑖 𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝑋ത = =
𝑛 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚
# of observations
• Variance:
𝑖=𝑁 ത 2
2
σ (𝑥
𝑖=1 𝑖 −𝑋) 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠
𝑠 = =
𝑛−1 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚
• Standard deviation:
2 # of observations minus
𝑠= 𝑠 # of estimates used
7
Does a sample represent the population?
• We want our data to be normally distributed so that we can use things
like the standard normal distribution to calculate probabilities.
• We can assess the data’s normality using Q-Q plots.
• https://youtu.be/okjYjClSjOg
• We hope the sample’s distribution reflects the underlying population’s
distribution (i.e., that it’s normal)
• The central limit theorem allows us to be very relaxed with how strict
the definition of normal is.

8
Central Limit Theorem
https://gallery.shinyapps.io/CLT_mean/
if you have a population with mean μ and standard deviation σ and
take sufficiently large random samples from the population with
replacement, then the distribution of the sample means will be
approximately normally distributed.

9
Central Limit Theorem
• The sampling distribution of sample means will have…
• the same mean as the population
• a standard deviation (i.e., the standard error) that gets smaller as the sample
size increases
• a shape that becomes more and more normal as the sample size increases

10
Practice tasks/questions
1. In R, type: sample(1:100,10,replace=TRUE) to get a random set of
numbers. Manually calculate the mean, median, range, variance, and
standard deviation of these 10 numbers.
• Change 1:100 and 10 to whatever values you want!
• Keep doing this until you’re comfortable with all calculations!
2. In the above examples, variance and standard deviation estimates should
use df=n-1. Why is that? In your answer, use a reference to the R code
used to create the samples.
3. Using the CLT app, assess the impact of changing sample size and
number of samples. Why do we care more about one of these two?
(hint: in reality, which of these two do we have more control of?)
11
Relevant Code
mean() # calculate mean
median() # calculate median
var() # calculate sample variance
sd() # calculate sample standard deviation
length() # calculate number of observations
boxplot() # make boxplot
hist() # make histogram
rnorm() # randomly sample from a normal population
sample() # sample from provided dataset

12
Questions/Comments
13
Foundations for inference
PNB 3XE3 – Fall 2024

Crump et al. (5)

Illowsky & Dean (6-8)
Lane (7, 9-11)

14
Objectives
• Understand Central Limit Theorem
• Explain confidence intervals
Sampling Distribution of Sample Means (SDSM)
We estimate mean and standard
deviation of SDSM using that We estimate the population’s
sample’s mean and standard mean using the SDSM
We collect data from a deviation
sample of 𝑛 observations
𝜇

Divide 𝒔 by 𝒏
Why even deal with the SDSM????
Two reasons:
1. We can create “confidence intervals”
2. We can assess the likelihood that two (or more) samples belong to
a single population*

*actually, this is technically not true. We will revisit it later.

Confidence intervals
• Recall that the sampling distribution of sample means:
• Is a standard normal distribution
• Has a standard deviation called the standard error of the mean (SEM)
• Reflects a distribution that includes the population mean
• Since the SDSM includes the population
mean, and we know its AUC, then we
can identify a range that probably
contains the population mean.
• This is the confidence interval.
Calculating Confidence Intervals
𝑠
• Formula: 𝐶𝐼 = 𝑋ത ± 𝑧
𝑛
• Where 𝑋ത is the sample mean, 𝑠 is the sample standard deviation, 𝑛 is
the sample size, and 𝑧 is the z-score for the desired probability.
• Given s/ 𝑛 is the standard error, we can also say:
𝐶𝐼 = 𝑋ത ± (𝑧 × 𝑠𝑋ത )
• Where 𝑠𝑋ത is the standard error, AKA the standard deviation of the
sampling distribution of sample means.

19
Deriving the CI formula
• Recall the formula for z-score:
-1.96 +1.96

𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 − 𝑚𝑒𝑎𝑛 𝑋 − 𝑋ത
𝑧= =
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑠𝑋ത
• Re-arrange: 𝑧 × 𝑠𝑋ത = 𝑋 − 𝑋ത
𝑋 = 𝑋ത + 𝑧 × 𝑠𝑋ത
• Critically, we want to define an interval that centres on the mean, so let’s find a z
score whose ± will result in desired probability.
𝑋1 = 𝑋ത − 𝑧 × 𝑠𝑋ത 𝑋2 = 𝑋ത + 𝑧 × 𝑠𝑋ത 𝐶𝐼 = 𝑋ത ± 𝑧 × 𝑠𝑋ത

• Typically, we like to cover 95% probability, so the z-score for that would be ±1.96
20
Example: Calculating a confidence interval
Stanley Milgram measured the level of obedience in individuals. In an effort to
replicate it, we brought students into the lab and measured their obedience using
an adapted, more ethical version of Milgram’s study. Obedience scores were
collected from 30 participants.

21
E.g., calculating a confidence interval
Data: 10.5 22.9 14.6 18.5 24.9 17.1 13.0 23.9 19.4 20.1 25.5 22.9 15.0 17.4 19.0

15.9 9.3 15.7 13.8 14.3 19.9 9.8 3.4 15.2 17.2 9.0 24.8 2.7 12.0 16.4

σ𝑖=𝑁
𝑖=1 𝑥𝑖
1. Calculate mean: 𝑋ത = = 16.14
𝑛
σ𝑖=𝑁 ത 2
𝑖=1 (𝑥𝑖 −𝑋)
2. Calculate sample SD: 𝑠 = = 5.89
𝑛−1
𝑠
3. Calculate SEM: 𝑠𝑋ത = = 1.07
𝑛
4. Calculate 95% CI: 𝐶𝐼95% = 𝑋ത ± 1.96𝑠𝑋ത = [14.03, 18.24]

22
E.g., interpreting a confidence interval
• The 95% CI for the sample mean is from 14.03 𝑡𝑜 18.24.
• One way to think about the calculated CI:

In reality, the population mean is NOT the exact same as this

sample’s mean. Imagine we magically knew that the
population mean was exactly 17, and we repeated the
14.03 18.24 experiment 100 times. The calculated CI would be just one of
these possibilities.
23

Sampling & Confidence Intervals
No ratings yet
Sampling & Confidence Intervals
72 pages
Sample Mean Distribution Explained
No ratings yet
Sample Mean Distribution Explained
26 pages
2006 Geog090 Week06 Lecture01 CentralLimitTheorem
No ratings yet
2006 Geog090 Week06 Lecture01 CentralLimitTheorem
37 pages
L8 Statistical Estimation 1
No ratings yet
L8 Statistical Estimation 1
48 pages
Introduction To Statistics Part IV: Statistical Inference: Achim Ahrens Anna Babloyan Erkal Ersoy
No ratings yet
Introduction To Statistics Part IV: Statistical Inference: Achim Ahrens Anna Babloyan Erkal Ersoy
44 pages
Lecture 6 Estimation
No ratings yet
Lecture 6 Estimation
8 pages
Lecture 4 Dr. Amani Week 13
No ratings yet
Lecture 4 Dr. Amani Week 13
34 pages
Applied Statistics and Probability For Engineers Chapter - 8
No ratings yet
Applied Statistics and Probability For Engineers Chapter - 8
13 pages
Expected Value & Standard Error in Sampling
No ratings yet
Expected Value & Standard Error in Sampling
7 pages
PSYCH 240: Statistics For Psychologists: Interval Estimation: Understanding The T Distribution
No ratings yet
PSYCH 240: Statistics For Psychologists: Interval Estimation: Understanding The T Distribution
44 pages
Inbound 588667172330667162
No ratings yet
Inbound 588667172330667162
30 pages
Sampling Distributions & Confidence Intervals
100% (1)
Sampling Distributions & Confidence Intervals
57 pages
4 Sampling Distributions Revised
No ratings yet
4 Sampling Distributions Revised
21 pages
Chapter 3 - Sampling Distribution and Confidence Interval1
No ratings yet
Chapter 3 - Sampling Distribution and Confidence Interval1
54 pages
Chpater Three
No ratings yet
Chpater Three
84 pages
Confidence Intervals and Hypothesis Tests For Means
No ratings yet
Confidence Intervals and Hypothesis Tests For Means
40 pages
Permutations, Probability, and Statistics Guide
No ratings yet
Permutations, Probability, and Statistics Guide
3 pages
Point Estimation
No ratings yet
Point Estimation
47 pages
Sampling Distributions & Confidence Interval
No ratings yet
Sampling Distributions & Confidence Interval
42 pages
03 Estimation IITB PDF
No ratings yet
03 Estimation IITB PDF
58 pages
Lecture06 Ch6 Forsyth Inf Stats FA24
No ratings yet
Lecture06 Ch6 Forsyth Inf Stats FA24
56 pages
Unit-4 - Confidence Interval and CLT
No ratings yet
Unit-4 - Confidence Interval and CLT
29 pages
Confidence Intervals & Estimation
No ratings yet
Confidence Intervals & Estimation
34 pages
Foundations of Statistical Inference
No ratings yet
Foundations of Statistical Inference
22 pages
Statistics 1 Revision Sheet
No ratings yet
Statistics 1 Revision Sheet
9 pages
Ci 1
No ratings yet
Ci 1
47 pages
Lecture 5 Confidence Intervals - After Class
No ratings yet
Lecture 5 Confidence Intervals - After Class
26 pages
Summary Week 2
No ratings yet
Summary Week 2
17 pages
Bio Stats
No ratings yet
Bio Stats
44 pages
Chapter 6. Estiamation
No ratings yet
Chapter 6. Estiamation
65 pages
Statistics
No ratings yet
Statistics
49 pages
Marketing Estimation & Hypothesis Testing
100% (1)
Marketing Estimation & Hypothesis Testing
73 pages
Biostatistics Revision DR - NJ
No ratings yet
Biostatistics Revision DR - NJ
67 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
98 pages
Reviewer Stats 3rd Q
No ratings yet
Reviewer Stats 3rd Q
1 page
Review of Chapters 1-5
No ratings yet
Review of Chapters 1-5
21 pages
Chapter 6 - Estimation
No ratings yet
Chapter 6 - Estimation
20 pages
Slideset 2
No ratings yet
Slideset 2
63 pages
Summary Week 2
No ratings yet
Summary Week 2
17 pages
GB Academy Equation List
No ratings yet
GB Academy Equation List
16 pages
Sampling Distribution
No ratings yet
Sampling Distribution
27 pages
Advanced Statistical Quality Control
No ratings yet
Advanced Statistical Quality Control
44 pages
Business Statistics Interval Estimation 2025
No ratings yet
Business Statistics Interval Estimation 2025
60 pages
Inferential Statistic: 1 Estimation of A Population Mean
No ratings yet
Inferential Statistic: 1 Estimation of A Population Mean
8 pages
Determining CI For Mean
No ratings yet
Determining CI For Mean
6 pages
Sampling Distributions and Confidence Intervals
No ratings yet
Sampling Distributions and Confidence Intervals
68 pages
Confidence Intervals & Hypothesis Testing
No ratings yet
Confidence Intervals & Hypothesis Testing
25 pages
Module 2 - Sample - Afterclass
No ratings yet
Module 2 - Sample - Afterclass
36 pages
1.business Analytics Course Summary-Merged
No ratings yet
1.business Analytics Course Summary-Merged
30 pages
UCT PSY2015F Statistics 2023
No ratings yet
UCT PSY2015F Statistics 2023
34 pages
Advanced Statistics for Research
No ratings yet
Advanced Statistics for Research
27 pages
Statistics II - Confidence Intervals, Estimation
No ratings yet
Statistics II - Confidence Intervals, Estimation
80 pages
4 Sampling Distributions
100% (1)
4 Sampling Distributions
30 pages
Stimation: Statistic
No ratings yet
Stimation: Statistic
46 pages
A Session 18 2021
No ratings yet
A Session 18 2021
36 pages
Lecture 7
No ratings yet
Lecture 7
18 pages
Lecture 30 - Sample and Population Mean
No ratings yet
Lecture 30 - Sample and Population Mean
49 pages
Lecture 5
No ratings yet
Lecture 5
130 pages
F23 Final Exam Review
No ratings yet
F23 Final Exam Review
12 pages
Experiment No 1
No ratings yet
Experiment No 1
5 pages
7. Norm and Norm Referencing
No ratings yet
7. Norm and Norm Referencing
12 pages
Basic Statistical Descriptions - Mean, Median, Mode, Variance, Standard Deviation
No ratings yet
Basic Statistical Descriptions - Mean, Median, Mode, Variance, Standard Deviation
11 pages
QP Math Put 2023-24 1
No ratings yet
QP Math Put 2023-24 1
3 pages
FIDP - Statistics and Probability SY 24-25
No ratings yet
FIDP - Statistics and Probability SY 24-25
13 pages
Understanding Z-Score and Z-Table Usage
No ratings yet
Understanding Z-Score and Z-Table Usage
5 pages
Range, SD, QD, Variance
No ratings yet
Range, SD, QD, Variance
14 pages
1 Meeting Summary For Geo 320
No ratings yet
1 Meeting Summary For Geo 320
3 pages
Mathematics in Modern World
No ratings yet
Mathematics in Modern World
39 pages
2 P 1 Assessment For Learning
No ratings yet
2 P 1 Assessment For Learning
42 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
10 pages
BBA (2023-24) Semester 1 Syllabus
No ratings yet
BBA (2023-24) Semester 1 Syllabus
17 pages
OOS-2024-25 - Part B - Unit 5 - Data Literacy-Data Collection To Data Analysis
No ratings yet
OOS-2024-25 - Part B - Unit 5 - Data Literacy-Data Collection To Data Analysis
63 pages
Module 3 Assignments
No ratings yet
Module 3 Assignments
3 pages
Understanding Measures of Central Tendency
No ratings yet
Understanding Measures of Central Tendency
6 pages
The Impact of Job Enrichment On Employees Perform
No ratings yet
The Impact of Job Enrichment On Employees Perform
10 pages
Numerical Summary Measures Guide
No ratings yet
Numerical Summary Measures Guide
73 pages
MCC202 Applied Statistics in Physical Education Final
No ratings yet
MCC202 Applied Statistics in Physical Education Final
34 pages
2024 Franceschi TrainingloadsandmicrocycleperiodisationinItalianSerieAyouthsoccerplayers ALBERTO FRANCHESCHI
No ratings yet
2024 Franceschi TrainingloadsandmicrocycleperiodisationinItalianSerieAyouthsoccerplayers ALBERTO FRANCHESCHI
12 pages
Business Analytics and Statistics Quiz
No ratings yet
Business Analytics and Statistics Quiz
8 pages
Understanding Systematic and Unsystematic Risk
No ratings yet
Understanding Systematic and Unsystematic Risk
13 pages
Physics Lab Manual for Students
No ratings yet
Physics Lab Manual for Students
70 pages
Juvenile Eye Growth and Myopia Insights
No ratings yet
Juvenile Eye Growth and Myopia Insights
6 pages
CUET PG Economics Questions Corrected
No ratings yet
CUET PG Economics Questions Corrected
7 pages
Emotional Intelligence and Emotional Mat
No ratings yet
Emotional Intelligence and Emotional Mat
5 pages
Normal Probability Distribution
No ratings yet
Normal Probability Distribution
6 pages
Final Model Question Paper Statistics HSSC-I
No ratings yet
Final Model Question Paper Statistics HSSC-I
6 pages
Assessment Questions by Important Notes
No ratings yet
Assessment Questions by Important Notes
17 pages
Assistant Statistical Officer in A.P. Economic and Statistical
No ratings yet
Assistant Statistical Officer in A.P. Economic and Statistical
5 pages

Lecture 2 Foundations of Inference

Uploaded by

Lecture 2 Foundations of Inference

Uploaded by

Datasets can be standardized using the mean

and standard deviation

Crump et al. (5)

*actually, this is technically not true. We will revisit it later.

In reality, the population mean is NOT the exact same as this

You might also like