P Inservice-Statistics
P Inservice-Statistics
Wanted to
Know about
Statistics but
Were Afraid to
Ask
Andrew L. Luna
Director
Institutional Research, Planning, and Assessment
The University of North Alabama
alluna@[Link]
Phone: 256.765.4221
COURSE OUTLINE
The Crimean War (1853-1856) was a bloody battle between the Russians and the
British Alliance (Great Brittan, France, Ottoman Empire, Kingdom of Sardinia) that
saw great casualties on both sides.
Data
Knowledge Information
New
Knowledge/Decisions
THE SCIENTIFIC METHOD
Scientific Method
The way researchers go about using knowledge and evidence to
reach objective conclusions about the real world.
The analysis and interpretation of empirical evidence (facts from
observation or experimentation) to confirm or disprove prior
conceptions
CHARACTERISTICS OF THE SCIENTIFIC
METHOD
Scientific Research is Public – Advances in science require
freely available information (replication/peer scrutiny)
Science is Objective – Science tries to rule out eccentricities
of judgment by researchers and institutions. Wilhelm von
Humboldt (1767-1835), founder University of Berlin (teaching,
learning, research) “Lehrfreiheit,” “Lernfreiheit,” and “Freiheit
der Wissenschaft”
Science is Empirical – Researchers are concerned with a
world that is knowable and potentially measurable.
Researchers must be able to perceive and classify what they
study and reject metaphysical and nonsensical explanations
of events.
CHARACTERISTICS OF THE SCIENTIFIC
METHOD, CONT.
Science is Systematic and Cumulative – No single research
study stands alone, nor does it rise or fall by itself. Research
also follows a specific method.
Theory – A set of related propositions that presents a
systematic view of phenomena by specifying relationsships
among concepts
Law – is a statement of fact meant to explain, in concise terms,
an action or set of actions that is generally accepted to be true
and universal
Science is Predictive – Science is concerned with relating the
present to the future (making predictions)
Science is Self-Correcting – Changes in thoughts, theories, or
laws are appropriate wen errors in previous research are
uncovered
FLOW CHART OF THE SCIENTIFIC
METHOD
Note: Diamond-
shaped boxes
indicate stages in the
research process in
which a choice of one
or more techniques
must be made. The
dotted line indicates
an alternative path
that skips exploratory
research.
TWO BASIC TYPES OF RESEARCH
Variable Types:
Independent – those that are systematically varied by the researcher
Dependent – those that are observed. Their values are resumed to
depend on the effects of the independent variables
Variable Forms:
Discrete – only includes a finite set of values (yes/no;
republican/democrat; satisfied….not satisfied, etc.)
Continuous – takes on any value on a continuous scale (height,
weight, length, time, etc.)
SCALES: CONCEPT
Example: Satisfaction
SCALES: OPERATIONAL DEFINITION
I believe national network news is fair in its portrayal of national news stories:
555-867-5309
9001
9
3.5 .05
97.5 502
4,832 834,722
77
999
.998 65.87 362
4001
.56732 51
1,248,965 2,387
9 21 672
145
Indicates a
difference
ORDINAL SCALE
Indicates a
difference
Indicates the
direction of the
distance (e.g. more
than or less than)
INTERVAL SCALE
Indicates a
difference
Indicates the
direction of the
distance (e.g. more
than or less than)
32 f 0c
Indicates the amount
of the difference (in
equal intervals)
RATIO SCALE
Indicates a
difference
Indicates the
direction of the
distance (e.g. more
than or less than)
Indicates the amount
of the difference (in
equal intervals)
Indicates an absolute
zero
DISCUSSION/TEST: IDENTIFY THE SCALE
Arbitron
Sammy Sosa
Rating# 21
Salary
Prices on the Stock Market
Satisfaction
Gender: Maleon = a1 1-7
or Female
Likert Scale
=2
How
Professorial
many times
rank:respondents
Asst. = 1, Assoc.
return=to
2, a Full
website
=3
Decibel
Number level
of Newspapers
of a speakersold each day
Weight
Amountof ofpaper
time a subject watches a television
program
THINGS ARE NOT ALWAYS WHAT THEY
SEEM TO BE…
Radio
Does Stations
it show a difference?
Does it show the direction of difference?
Is the difference measured in equal intervals?
Does the measure have an absolute zero?
OPERATIONAL DEFINITIONS: CLASSROOM
PROJECT
Provide operational definitions for the following:
Artistic quality
Objectionable song lyrics
Writing quality
Sexual content
Critical Thinking
Break
Return at 10:00
a.m
TWO SETS OF SCORES…
Group 1 Group 2
100, 100 91, 85
99, 98 81, 79
88, 77 78, 77
72, 68 73, 75
67, 52 72, 70
43, 42 65, 60
67 1
52 1
43 1
42 1
40-59 60-79 80-100
Cumulative Cumulative
Scores Frequency Percentage Frequency Percentage
Frequency Distribution 100 2 8.33% 2 8.33%
with Columns for 99 1 4.17% 3 12.50%
Percentage, Cumulative 98 1 4.17% 4 16.67%
Frequency, and 91 1 4.17% 5 20.83%
Cumulative Percentage 88 1 4.17% 6 25.00%
85 1 4.17% 7 29.17%
81 1 4.17% 8 33.33%
79 1 4.17% 9 37.50%
78 1 4.17% 10 41.67%
77 2 8.33% 12 50.00%
75 1 4.17% 13 54.17%
73 1 4.17% 14 58.33%
72 2 8.33% 16 66.67%
70 1 4.17% 17 70.83%
68 1 4.17% 18 75.00%
67 1 4.17% 19 79.17%
65 1 4.17% 20 83.33%
60 1 4.17% 21 87.50%
52 1 4.17% 22 91.67%
43 1 4.17% 21 87.50%
42 1 4.17% 24 100.00%
N= 24 100.00%
CREATING A HISTOGRAM (BAR CHART)
Histogram (n=100)
14
12
10
8
Frequency
0
42 43 52 60 65 67 68 70 72 73 75 77 78 79 81 85 88 91 98 99 100
Scores
CREATING A FREQUENCY POLYGON
Frequency Polygon
14
12
10
8
Frequency
0
42 43 52 60 65 67 68 70 72 73 75 77 78 79 81 85 88 91 98 99 100
Scores
NORMAL DISTRIBUTION
68%
95%
95%
99% 99%
THE BELL CURVE
.01 .01
Significant Significant
Mean=70
CENTRAL LIMIT THEOREM
a1 a2 a3 a4 ... an
a
n
ARITHMETIC MEAN EXAMPLE
98
88
81
74
72
72 741\10 = 74.1
70
69
65
52
741
NORMAL DISTRIBUTION
68%
95%
95%
99% 99%
FREQUENCY POLYGON OF TEST SCORE
DATA
Frequency Polygon
14
12
10
8
Frequency
0
42 43 52 60 65 67 68 70 72 73 75 77 78 79 81 85 88 91 98 99 100
Scores
SKEWNESS
Left-Skewed Distribution
12
10
8
Frequency
6
4
2
0
42 52 65 68 72 75 78 81 88 98 10
0
Scores
SKEWNESS
Frequency Polygon
14
12
10
8
Frequency
6
4
2
0
42 52 65 68 72 75 78 81 88 98 10
0
Scores
NORMAL DISTRIBUTION
68%
95%
95%
99% 99%
5
Frequency
1
Median = 56K
0
25 27 29 32 35 38 43 45 48 51 54 56 59 60 62 65 68 71 75 78 85 88 91 95 98 99 100 150 175
Group 1 Group 2
100, 100 91, 85
99, 98 81, 79
88, 77 78, 77
72, 68 73, 75
67, 52 72, 70
43, 42 65, 60
Range G1: 100 – 42 = 58 Range G2: 91 – 60 = 31
POPULATION VARIANCE
2
2 X X )
S i
N
SAMPLE VARIANCE
2
2 X X )
s i
n 1
VARIANCE
A method of describing variation in a set of scores
The higher the variance, the greater the variability and/or
spread of scores
VARIANCE EXAMPLE
X X X-X X –X2
98 - 74.1 = 23.90 = 571.21 Population Variance (N)
88 - 74.1 = 13.90 = 193.21
81 - 74.1 = 6.90 = 47.61 1,434.90 \ 10 = 143.49
74 - 74.1 = -0.10 = 0.01
72 - 74.1 = -2.10 = 4.41
72 - 74.1 = -2.10 = 4.41
70 - 74.1 = -4.10 = 16.81 Sample Variance (n-1)
69 - 74.1 = -5.10 = 26.01
65 - 74.1 = -9.10 = 82.81 1,434.90 \ 9 = 159.43
52 - 74.1 = -22.10 = 488.41
Mean = 74.1 1,434.90
USES OF THE VARIANCE
s
Xi X 2
n 1
STANDARD DEVIATION
EXAMPLE
Population STD
X X X-X X –X2
1,434.90 \ 10 = 143.49
98 - 74.1 = 23.90 = 571.21
88 - 74.1 = 13.90 = 193.21
(SQRT) 143.49 = 11.98
81 - 74.1 = 6.90 = 47.61
74 - 74.1 = -0.10 = 0.01
72 - 74.1 = -2.10 = 4.41
72 - 74.1 = -2.10 = 4.41 Sample STD
70 - 74.1 = -4.10 = 16.81
1,434.90 \ 9 = 159.43
69 - 74.1 = -5.10 = 26.01
65 - 74.1 = -9.10 = 82.81
- 74.1 = -22.10 = 488.41 (SQRT) 159.43 = 12.63
52
Mean = 74.1 1,434.90
CLASS ASSIGNMENT
A survey was given to UNA students to find out how many hours per week
they would listen to a student-run radio station. The sample responses
were separated by gender. Determine the mean, range, variance, and
standard deviation of each group.
15 30
25 15
12 21
7 12
3 26
32 20
17 5
16 24
9 18
24 10
GROUP ONE (FEMALES)
Significant Significant
58 62 66 Mean=70 74 78 82
-3 -2 -1 0 1 2 3
How Variability and Standard Deviation Work…
Class A Class B
Mean
Mean = 75.5 Mean = 75.5
STD = 21.93 STD = 8.42
HOW DO WE USE THIS STUFF?
The type of data determines what kind of measures you can
use
Higher order data can be used with higher order statistics
WHEN SCORES DON’T COMPARE
𝑋−𝑋
𝑧=
𝑆
Z-Scores with positive numbers are above the mean while Z-Scores
with negative numbers are below the mean.
Z-SCORES, CONT.
𝑋 −𝜇 80 −100 − 20
𝑧= = = =−1.25
𝜕 16 16
COMPARING Z-SCORES
𝑋 −𝜇 78 − 75 3
Mathematics 𝑧= = = =0.5
𝜎 6 6
115 − 103 12
Natural Science 𝑧= = =0.86
14 14
57 − 52 5
English 𝑧= = =1.25
4 4
AREA UNDER THE NORMAL CURVE
50% 50%
34.1% 34.1%
13.5% 13.5%
2.2% 2.2%
68.2%
95.2%
99.6%
AREA UNDER THE NORMAL CURVE
34.1% 34.1%
13.5% 13.5%
2.2% 2.2%
0 1
2− 2 2.5 − 2
=0 =1
.5 .5
Answer = 34%
AREA UNDER THE NORMAL CURVE
50% 50%
34.1% 34.1%
13.5% 13.5%
2.2% 2.2%
2
3− 2
=2
.5
Answer = 2.2%
AREA UNDER THE NORMAL CURVE
50% 50%
34.1% 34.1%
13.5% 13.5%
2.2% 2.2%
INTERPRETATION
Interpretation
The process of drawing inferences from the analysis results.
Inferences drawn from interpretations lead to managerial
implications and decisions.
From a management perspective, the qualitative meaning of the data
and their managerial implications are an important aspect of the
interpretation.
INFERENTIAL STATISTICS PROVIDE TWO
ENVIRONMENTS:
Test for Difference – To test whether a significant
difference exists between groups
Tests for relationship – To test whether a significant
relationship exist between a dependent (Y) and
independent (X) variable/s
Relationship may also be predictive
HYPOTHESIS TESTING USING BASIC
STATISTICS
Univariate Statistical Analysis
Tests of hypotheses involving only one variable.
Bivariate Statistical Analysis
Tests of hypotheses involving two variables.
Multivariate Statistical Analysis
Statistical analysis involving three or more variables or sets of
variables.
HYPOTHESIS TESTING PROCEDURE
Process
The specifically stated hypothesis is derived from the research
objectives.
A sample is obtained and the relevant variable is measured.
The measured sample value is compared to the value either stated
explicitly or implied in the hypothesis.
If the value is consistent with the hypothesis, the hypothesis is supported.
If the value is not consistent with the hypothesis, the hypothesis is not
supported.
HYPOTHESIS TESTING PROCEDURE,
CONT.
H 0 – Null Hypothesis
“There is no significant difference/relationship between groups”
H a – Alternative Hypothesis
“There is a significant difference/relationship between groups”
Always state your Hypothesis/es in the Null form
The object of the research is to either reject or accept the
Null Hypothesis/es
SIGNIFICANCE LEVELS AND P-VALUES
Significance Level
A critical probability associated with a statistical hypothesis test that
indicates how likely an inference supporting a difference between an
observed value and some statistical expectation is true.
The acceptable level of Type I error.
p-value
Probability value, or the observed or computed significance level.
p-values are compared to significance levels to test hypotheses.
Lunch
Return at 1:00 p.m.
EXPERIMENTAL RESEARCH: WHAT
HAPPENS?
Something Will
Something Not
Not Happen
Will Happen
It Does Not
It Happens
Happen
Something Will
Something Will
Happen
Happen
It Does Not
It Happens
Happen
TYPE I AND TYPE II ERRORS
Type I Error
An error caused by rejecting the null hypothesis when it should be
accepted (false positive).
Has a probability of alpha (α).
Practically, a Type I error occurs when the researcher concludes that
a relationship or difference exists in the population when in reality it
does not exist.
“There really are no monsters under the bed.”
TYPE I AND TYPE II ERRORS (CONT’D)
Type II Error
An error caused by failing to reject the null hypothesis when the
hypothesis should be rejected (false negative).
Has a probability of beta (β).
Practically, a Type II error occurs when a researcher concludes that
no relationship or difference exists when in fact one does exist.
“There really are monsters under the bed.”
TYPE I AND II ERRORS AND FIRE
ALARMS?
FIRE NO FIRE
H0 is H0 is
False True
TYPE I TYPE II
NORMAL DISTRIBUTION
.05 .05
.01 .01
68%
95%
95%
99% 99%
RECAPITULATION OF THE RESEARCH
PROCESS
Collect Data
Run Descriptive Statistics
Develop Null Hypothesis/es
Determine the Type of Data
Determine the Type of Test/s (based on type of data)
If test produces a significant p-value, REJECT the Null
Hypothesis. If the test does not produce a significant
p-value, ACCEPT the Null Hypothesis.
Remember that, due to error, statistical tests only
support hypotheses and can NOT prove a phenomenon
DATA TYPE V. STATISTICS USED
X Y
1 4
3 6
5 10
5 12
1 13
2 3
4 3
6 8
PEARSON R CORRELATION COEFFICIENT
𝑟=
∑ 𝑥𝑦
√∑ 𝑥 ∑ 𝑦
2 2
ALTERNATIVE FORMULA
∑ 𝑥∑ 𝑦
∑ 𝑥𝑦 −
𝑁
𝑟=
√ (∑ 𝑥 )
√ (∑ 𝑌 )
2 2
∑𝑥 2
−
𝑁
∑𝑌 2
−
𝑁
HOW CAN R’S BE USED?
Y Y Y
X X X
Y
R’s of 1.00 or -1.00 are perfect correlations
Class I
Class D Class M
Class G
Class L
Class A Class F
Variable 1
Variable 2 Data Type 1 Data Type 2 Totals
Category 1 a b a+b
Category 2 c d c+d
Total a+c b+d a+b+c+d
2
2 ( 𝑎𝑑− 𝑏𝑐 ) ( 𝑎+𝑏+𝑐 +𝑑 )
𝑥 =
( 𝑎+𝑏 )( 𝑐 +𝑑 ) ( 𝑏+𝑑 ) ( 𝑎+𝑐 )
CHI SQUARE STEPS
Actual Data
Male Female Total
Like 36 14 50 Row
Column
Total
Total Dislike 30 25 55
Total 66 39 105
Grand
To find the expected frequencies, assume independence of the Total
rows and columns. Multiply the row total to the column total
and divide by grand total
rt * ct 50 * 66
ef OR 31.43
gt 105
CHI SQUARE
Expected Frequencies
Male Female Total
Like 31.43 18.58 50.01
Dislike 34.58 20.43 55.01
Total 66.01 39.01 105.02
O E O-E (O-E)2/E
36 31.43 4.57 .67
14 18.58 -4.58 1.13
30 34.58 -4.58 .61
25 20.43 4.57 1.03
Check Critical Value Table for Chi Square Distribution on Page 448 of text
RESULTS OF CHI SQUARE TEST
rt * ct 21* 30
ef OR 8.4
gt 75
CHI SQUARE
Expected Frequencies
O E O-E (O-E)2/E
11 8.4 2.6 .805
6 5.6 .4 .029
4 7 3 1.286
12 10.4 1.6 .246
7 6.9 .1 .001
7 8.7 1.7 .332
7 11.2 4.2 1.575
7 7.5 .5 .033
14 9.3 4.7 2.375
x1 x2
t
S x1 x2
GOSSET, BEER, AND STATISTICS…
x1 x2
t
S x1 x2
x1 Mean for group 1
Group 1 Group 2
x1 16.5 x2 12.2
S1 2.1 S 2 2.6
n1 21 n2 14
Null Hypothesis
x1 x2
H 0 : 1 2 t
S x1 x2
STEP 1: POOLED ESTIMATE OF THE
STANDARD ERROR
2 2
(n1 1) S (n2 1) S 1 1
S x1 x2 ( 1
)( )2
n1 n2 2 n1 n2
S 22 x1 16.5 x2 12.2
Variance of group 2
S1 2.1 S 2 2.6
n1 Sample size of group 1 n1 21 n2 14
2 2
(n1 1) S (n2 1) S 1 1
S x1 x2 ( 1 2
)( )
n1 n2 2 n1 n2
2 2
(20)( 2.1) (13)( 2.6) 1 1
S x1 x2 ( )( )
33 21 14
=0.797
STEP 2: CALCULATE THE T-STATISTIC
x1 x2
t
S x1 x2
Group 1 Group 2
x1 16.5 x2 12.2
S1 2.1 S 2 2.6
n1 21 n2 14
𝐻 0 : 𝜇1 =𝜇2 =𝜇 3
ANOVA ASSUMPTIONS
SS df MF F
𝑆𝑆 𝑇 =∑ 𝑥 T −
2
( ∑ 𝑥𝑇
𝑁𝑇 ) 2
( 33 ) 2
𝑆𝑆 𝑇 =107 −
15
1089
𝑆𝑆 𝑇 =107 − =107 − 72.6=𝟑𝟒 . 𝟒
15
CALCULATING SUM OF SQUARES WITHIN
++
(
𝑆𝑆 𝑤 = 74 −
324
5 )(
+ 26 −
100
5
+ 7− )(
25
5 )
𝑆𝑆 𝑤 =( 74 −64.8 ) + ( 26 −20 ) + ( 7 −5 )
𝑆𝑆 𝑊 =9.2+6 +2=𝟏𝟕 .𝟐
CALCULATING SUM OF SQUARES
BETWEEN
( ∑ 𝑥1 ) 2 ( ∑ 𝑥 2 ) 2 ( ∑ 𝑥 3 ) 2 ( ∑ 𝑋 𝑇 ) 2
𝑆𝑆 𝐵 = + + −
𝑛1 𝑛2 𝑛3 𝑁𝑇
( 18 ) 2 ( 10 ) 2 ( 5 ) 2 ( 33 ) 2
𝑆𝑆 𝐵 = + + −
5 5 5 15
324 100 25 1089
𝑆𝑆 𝐵 = + + −
5 5 5 15
𝑆𝑆 𝐵 =64.8+20 +5− 72.6=𝟏𝟕 . 𝟐
COMPLETE THE ANOVA TABLE
SS df MF F
If the F statistic is higher than the F probability table, reject the null
hypothesis
YOU ARE NOT DONE YET!!!
G1 compared to G2
G1 compared to G3
G2 compared to G3
RUNNING THE TUKEY TEST
𝑀 𝑖− 𝑀 𝑗
𝑡 𝑠=
√ 𝑀𝑆𝐸
𝑛h
Where Mi – Mj is the difference ith and jth means, MSE
is the Mean Square Error, and nh is the harmonic mean
of the sample sizes of groups i and j.
RESULTS OF THE ANOVA AND FOLLOW-
UP TESTS
If the F-statistic is significant, then the ANOVA
indicates a significant difference
The follow-up test will indicate where the differences
are
You may now state that you reject the null
hypothesis and indicate which groups were
significantly different from each other
REGRESSION ANALYSIS
x
PREDICTIVE VERSUS EXPLANATORY
REGRESSION ANALYSIS
Prediction – to develop a model to predict future values of a
response variable (Y) based on its relationships with predictor
variables (X’s)
Explanatory Analysis – to develop an understanding of the
relationships between response variable and predictor
variables
PROBLEM STATEMENT
( 𝑦 ) =𝑎 +𝑏𝑥
𝑺𝒍𝒐𝒑𝒆 ( 𝒃 )=(𝑁 Σ 𝑋𝑌 − ( Σ 𝑋 ) ( Σ 𝑌 ))¿/(𝑁 Σ 𝑋 2− ( Σ 𝑋 ) 2)
𝑰𝒏𝒕𝒆𝒓𝒄𝒆𝒑𝒕 ( 𝒂 ) =(Σ 𝑌 − 𝑏 ( Σ 𝑋 ) )/ 𝑁
Where:
r Y Yˆ
Slope (b) i i i
Actual Values
Intercept (a) x
SIMPLE VS. MULTIPLE REGRESSION
Simple: Y = a + bx
X1
X2