1/31/2017
Steven Tung
January 24 Lecture 1a: Data Types, Sampling Methodology, and Sampling Error
January 26 Lecture 2a: Descriptive Statistics, Probability Distribution, and Z score
January 31 Lecture 3a: Central Limit Theorem
February 2 Lecture 4a: Hypothesis Testing
February 7 Lecture 5a: Student’s t-test
February 9 Lecture 6a: Analysis of Variance
February 14 Lecture 7a: Non-Parametric Statistics and Chi-Square
February 16 Lecture 8a: Binomial and Poisson
February 21 Lecture 9a: Simple Linear Regression and Multiple Regression
February 23 Reserved
February 28 Reserved
March 2 Reserved – Possible Final Exam
1
1/31/2017
Parametric statistics are used to make inferences about
population parameters
In order to apply parametric statistical test, certain
assumptions must hold true:
o Samples come from a normal distribution
o Dependent variable is continuous (interval or ratio)
o Samples are random (selection of sample members are
independent of one another)
o Homogeneity of Variance (Variances are equal)
Should these assumption fail to hold true, and “fixes” can’t be
made, you must use the less powerful non-parametric
statistical tests
The Central Limit Theorem is the foundation of all
parametric statistics
For both normal and non-normal distributed data, the
sampling distribution of the sample means will
approximate the normal distribution as the sample size
gets large, regardless of what the original distribution
looks like.
2
1/31/2017
10 If we were to plot the
9
8
frequency of sample means
7 (x̄), it would appear to look like
6 a normal distribution
5
4
3
2
1
0
1 2 3 4 5 6
Sample size = 4
S1 = [1,4,5,6] x̄1 = 4
S2 = [3,4,4,6] x̄2 = 4.25
S3 = [1,5,6,6] x̄3 = 4.5
. The frequency table can be
. referred to as the “Sampling
Sn = [?,?,?,?] x̄n = ?? Distribution of the Sample
Mean”
Gmacro Gmacro
Population SamplingDistribution
#generate 10000 random numbers from the #k1 is sample size
#normal distribution with mean of 3 and Let k1 = 4
#stdev of 1 and place them into column 1 #k2 is number of samples
Random 10000 c1; Let k2 = 100
Normal 3 1.
#generate 10000 random numbers from the #Calculate mean of sample
#uniform distribution with a of 0 and b of 6 Do k3 = 1:k2
#and place them into column 3 #c5 stores index location based on
Random 10000 c3; #sample size requested
Uniform 0 6. Random k1 c5;
#plot the histograms of both my "population" integer 1 10000.
Histogram c1 c3; #c6 stores the actual value based on
Bar. #indexed location stored in c5
Endmacro Do k4 = 1:k1
#k5 is needed as a temporary
#index holder
Let k5 = c5(k4)
Let c6(k4) = c1(k5)
Let c7(k4) = c3(k5)
Enddo
#store sample mean from c6
Let c9(k3) = mean(c6)
Let c10(k3) = mean(c7)
Enddo
#plot the histograms of both my "sample distribution of
#the sample mean"
Histogram c9 c10;
Bar.
Endmacro
3
1/31/2017
Mean
o “Sampling Distribution of the Sample Mean” will have the same mean as the
original distribution:
Variance and Standard Deviation
o “Sampling Distribution of the Sample Mean” variance will follow the relationship:
o “Sampling Distribution of the Sample Mean” standard deviation will follow the
relationship:
• The standard deviation of the sampling distribution of the sample mean is
sometimes called the standard deviation of the mean or standard error of the mean
(SEM)
If a characteristic is distributed normally in the population, you
can automatically use parametric statistical test to compare
the means (This is because the sampling distribution of the
sample mean drawn from normally distributed data always
has a normal distribution)
If a characteristic is NOT distributed normally in the
population, you CAN’T automatically use parametric statistical
test to compare sample means (This is because sampling
distribution of the sample mean drawn from non-normally
distributed data doesn’t always have normal distribution)
o In this case, the sample size must be sufficiently large for the
sampling distribution of the sample mean to approach a normal
distribution