0% found this document useful (0 votes)
67 views171 pages

P Inservice-Statistics

The document provides an outline for a course on statistics. The course is divided into 4 sections that cover topics such as scientific method, descriptive and inferential statistics, hypothesis testing, correlation, ANOVA, and regression. The first section introduces concepts like sampling and types of data. The second section covers descriptive statistics, normal distribution, hypothesis testing, and measures of central tendency. The third section discusses type I and II errors, correlation, and t-tests. The final section focuses on ANOVA and regression analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views171 pages

P Inservice-Statistics

The document provides an outline for a course on statistics. The course is divided into 4 sections that cover topics such as scientific method, descriptive and inferential statistics, hypothesis testing, correlation, ANOVA, and regression. The first section introduces concepts like sampling and types of data. The second section covers descriptive statistics, normal distribution, hypothesis testing, and measures of central tendency. The third section discusses type I and II errors, correlation, and t-tests. The final section focuses on ANOVA and regression analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Everything You

Wanted to
Know about
Statistics but
Were Afraid to
Ask

Andrew L. Luna, Ph.D


CONTACT INFORMATION

Andrew L. Luna
Director
Institutional Research, Planning, and Assessment
The University of North Alabama
alluna@[Link]
Phone: 256.765.4221
COURSE OUTLINE

 Section 1: Scientific Method, descriptive/inferential statistics,


sampling, validity, and types of data.
 Break
 Section 2: Descriptive statistics, normal distribution, Central
Limit Theorem, measures of central tendency, z scores,
hypothesis testing.
 Lunch
 Section 3: Type I and Type II Error, Pearson R, Degrees of
Freedom, Chi Square, t-test.
 Break
 Section 4: ANOVA and Regression
CONNECTION?
THE CRIMEAN WAR

The Crimean War (1853-1856) was a bloody battle between the Russians and the
British Alliance (Great Brittan, France, Ottoman Empire, Kingdom of Sardinia) that
saw great casualties on both sides.

“Half a league, half a league,


Half a league onward,
All in the valley of Death
Rode the six hundred.
"Forward, the Light Brigade!
"Charge for the guns!" he
said:
Into the valley of Death
Rode the six hundred…”
Alfred, Lord Tennyson, “The Charge of the Light Brigade.” Written to memorialize events in
the Balaclava, Oct. 25, 1854.
FLORENCE NIGHTINGALE “LADY WITH
THE LAMP”
Florence Nightingale observed the horrific
conditions of the wounded and was
instrumental in convincing the British
government to make sweeping changes in the
sanitary conditions of the make-shift “hospitals.”
Her work to make conditions more sanitary
caused the mortality rate to decline from 44
percent to 2 percent within 6 months.

Nightingale wanted to create a visual


Florence Nightingale (1820-1910) representation of her argument on sanitary
Lo! in that hour of misery conditions in her reports to the British
A lady with a lamp I see government. She saw that creating a circle
Pass through the glimmering gloom, denoting 100 percent of an event, and dividing
And flit from room to room. that circle into segments, she could produce a
simple graph that contained a lot of
Henry Wadsworth Longfellow’s 1857 poem “Santa
Filomena” information…thus, Florence Nightingale created
the PIE CHART!
Knowledge, Data, Information, and
Decisions…

Data

Knowledge Information

New
Knowledge/Decisions
THE SCIENTIFIC METHOD

 Scientific Method
 The way researchers go about using knowledge and evidence to
reach objective conclusions about the real world.
 The analysis and interpretation of empirical evidence (facts from
observation or experimentation) to confirm or disprove prior
conceptions
CHARACTERISTICS OF THE SCIENTIFIC
METHOD
 Scientific Research is Public – Advances in science require
freely available information (replication/peer scrutiny)
 Science is Objective – Science tries to rule out eccentricities
of judgment by researchers and institutions. Wilhelm von
Humboldt (1767-1835), founder University of Berlin (teaching,
learning, research) “Lehrfreiheit,” “Lernfreiheit,” and “Freiheit
der Wissenschaft”
 Science is Empirical – Researchers are concerned with a
world that is knowable and potentially measurable.
Researchers must be able to perceive and classify what they
study and reject metaphysical and nonsensical explanations
of events.
CHARACTERISTICS OF THE SCIENTIFIC
METHOD, CONT.
 Science is Systematic and Cumulative – No single research
study stands alone, nor does it rise or fall by itself. Research
also follows a specific method.
 Theory – A set of related propositions that presents a
systematic view of phenomena by specifying relationsships
among concepts
 Law – is a statement of fact meant to explain, in concise terms,
an action or set of actions that is generally accepted to be true
and universal
 Science is Predictive – Science is concerned with relating the
present to the future (making predictions)
 Science is Self-Correcting – Changes in thoughts, theories, or
laws are appropriate wen errors in previous research are
uncovered
FLOW CHART OF THE SCIENTIFIC
METHOD

Note: Diamond-
shaped boxes
indicate stages in the
research process in
which a choice of one
or more techniques
must be made. The
dotted line indicates
an alternative path
that skips exploratory
research.
TWO BASIC TYPES OF RESEARCH

 Qualitative Research (words) - is by definition exploratory, and


it is used when we don’t know what to expect, to define the
problem or develop an approach to the problem. It’s also used
to go deeper into issues of interest and explore nuances
related to the problem at hand. Common data collection
methods used in qualitative research are focus groups, in-
depth interviews, uninterrupted observation, bulletin boards,
and ethnographic participation/observation.
 Quantitative Research (numbers) - is conclusive in its purpose
as it tries to quantify the problem and understand how
prevalent it is by looking for projectable results to a larger
population. Here we collect data through surveys (online,
phone, paper), audits, points of purchase (purchase
transactions), and other trend data.
STATING A HYPOTHESIS OR RESEARCH
QUESTION
 Research Question – A formally stated question intended to
provide indications about some; it is not limited to
investigating relationships between variables. Used when the
researcher is unsure about the nature of the problem under
investigation.
 Hypothesis – a formal statement regarding the relationship
between variables and is tested directly. The predicted
relationship between the variables is either true or false.
 Independent Variable (X i )– the variable that is systematically varied
by the researcher
 Dependent Variable (Y i ) – the variable that is observed and whose
value is presumed to depend on independent variables
HYPOTHESIS VS. RESEARCH QUESTION

 Research Question: “Does television content enrich a child’s


imaginative capacities by offering materials and ideas for
make-believe play?
 Hypothesis: The amount of time a child spends in make-
believe play is directly related to the amount of time spent
viewing make-believe play on television.
 Null Hypothesis: the denial or negation of a research
hypothesis; the hypothesis of no difference
 H O : “There is no significant difference between the amount of time
children engage in make-believe play and the amount of time
children watch make-believe play on television.”
DATA ANALYSIS AND INTERPRETATION

 Every research study must be carefully planed and performed


according to specific guidelines.
 When the analysis is completed, the researcher must step
back and consider what has been discovered.
 The researcher must ask two questions:
 Are the results internally and externally valid?
 Are the results valid

Neither Valid Valid but Not Valid Both Valid


nor Reliable not Reliable but Reliable and Reliable
INTERNAL VALIDITY

If y = f(x), control over the research conditions


is necessary to eliminate the possibility of
finding that y = f(b), where b is an extraneous
variable.
Artifact – Any variable that creates a possible
but incorrect explanation of results. Also
referred to as a confounding variable.
The presence of an artifact indicates issues of
internal validity; that is, the study has failed
to investigate its hypothesis
WHAT AFFECTS INTERNAL VALIDITY

 History – various events that occur during a study may affect


the subject’s attitudes, opinions, and behavior.
 Maturation – Subjects’ biological and psychological
characteristics change during the course of a study (mainly
longitudinal).
 Testing – The act of testing may cause artifacts depending on
the environment, giving similar pre-tests/post-tests, and/or
timing.
 Instrumentation – A situation where equipment malfunctions,
observers become tired/casual, and/or interviewers may make
mistakes.
 Statistical regression – Subjects who achieve either very high or
very low scores on a test tend to regress to (move toward) the
sample or population mean.
WHAT AFFECTS INTERNAL VALIDITY,
CONT.
 Experimental Mortality – All research studies face the possibility
that subjects will drop out for one reason or another.
 Sample Selection – When groups are not selected randomly or
when they are not homogeneous
 Demand Characteristics – Subjects’ reactions to experimental
situations. Subjects who recognize the purpose of a study may
produce only “good” data for researchers ( Hawthorne Effect).
 Experimenter Bias – Researcher becomes swayed by a client’s (or
personal) wishes for a project’s results ( Blind vs. Double Blind ).
 Evaluation Apprehension – Subjects are afraid of being measured
or tested.
 Causal Time Order – An experiment’s results are due not to the
stimulus (independent) variable but rather to the effect of the
dependent variable.
WHAT AFFECTS INTERNAL VALIDITY,
CONT.
 Diffusion or Imitation of Treatments – Where respondents
may have the opportunity to discuss the experiment/study
with another respondent who hasn’t yet participated.
 Compensation – The researcher treats the control group
differently because of the belief that the group has been
“deprived.”
 Compensatory Rivalry – Subjects who know they are in the
control group may work harder to perform differently or
outperform the experimental group.
 Demoralization – Control group may feel demoralized or angry
that they are not in the experiential group.
EXTERNAL VALIDITY

 How well the results or a study can be generalized across the


population.
 Use random samples.
 Us heterogeneous (diverse) samples and replicate the study
several times.
 Select a sample that is representative of the group to which
the results will be generalized.
Sample Population
PROBABILITY VERSUS NONPROBABILITY
SAMPLING
Probability Sampling
 A sampling technique in which every member of the
population has a known, nonzero probability of
selection.
Nonprobability Sampling
 A sampling technique in which units of the sample
are selected on the basis of personal judgment or
convenience.
 The probability of any particular member of the
population being chosen is unknown.
REPLICATION

 Replication - the independent verification of a study and is


designed to eliminate:
 Design-specific results
 Sample-specific results
 Method-specific results
 Literal Replication – Involves the exact duplication of a
previous study
 Operational Replication – attempts to duplicate only the
sampling and experimental procedures of a previous study
 Instrumental replications – Attempts to duplicate the
dependent measures used in a previous study.
 Constructive Replication – Attempts to test the validity of a
previous study by not imitating the previous study.
CONCEPTS

 Building Blocks of Theory


 Abstract
 Represents broad, general ideas
 Not directly observable
 Examples:
 Reality
 Ideology
 Commercialism
 Value
 Aesthetics
THEORY

 Systematic; abstract explanation of some aspect of reality


 Primary goal is to provide a framework that links research
and practice and contributes to making findings meaningful
and generalizable
 Structure for interpretation of findings
 Means for summarizing and explaining observations for an
isolated study
 Source to generate hypothesis
 Framework for guiding research
 Guide for selecting appropriate method
 Basis to describe, explain or predict factors influencing
outcomes
CONSTRUCTS

 Concepts that are specified in such away they are observable


in the real world
 Invented
 Examples
 (Reality) Opinion, Choice
 (Ideology) Conservatism, Liberalism, Libertarianism, Socialism
 (Commercialism) Profit, Ratings
 (Value) amount of information , newsworthiness, time spent
 (Aesthetics) Color, Layout, Sound, Composition
VARIABLES

 Concepts that are observable and measurable


 Have a dimension that can vary
 Narrow in meaning
 Examples:
 Color classification
 Loudness
 Level of satisfaction/agreement
 Amount of time spent
 Media choice
TYPES AND FORMS OF VARIABLES

 Variable Types:
 Independent – those that are systematically varied by the researcher
 Dependent – those that are observed. Their values are resumed to
depend on the effects of the independent variables

 Variable Forms:
 Discrete – only includes a finite set of values (yes/no;
republican/democrat; satisfied….not satisfied, etc.)
 Continuous – takes on any value on a continuous scale (height,
weight, length, time, etc.)
SCALES: CONCEPT

 A generalized idea about a class of objects, attributes,


occurrences, or processes

Example: Satisfaction
SCALES: OPERATIONAL DEFINITION

 Specifies what the researcher must do to measure the


concept under investigation

Example: A 1-7 scale measuring


the level of satisfaction; A measure
of number of hours watching TV.
MEDIA SKEPTICISM: CONCEPTUAL
DEFINITION
 Media skepticism - the degree to which individuals are
skeptical toward the reality presented in the mass media.
Media skepticism varies across individuals, from those who
are mildly skeptical and accept most of what they see and
hear in the media to those who completely discount and
disbelieve the facts, values, and portrayal of reality in the
media.
MEDIA SKEPTICISM: OPERATIONAL
DEFINITION
Please tell me how true each statement is about the news
story. Is it very true, not very true, or not at all true?
1. The program was not accurate in its portrayal of the problem.
2. Most of the story was staged for entertainment purposes.
3. The presentation was slanted and unfair.

I believe national network news is fair in its portrayal of national news stories:

Strongly Disagree Disagree Neutral Agree Strongly Agree


NUMBERS, NUMBERS EVERYWHERE

555-867-5309
9001
9
3.5 .05
97.5 502
4,832 834,722
77
999
.998 65.87 362
4001
.56732 51

1,248,965 2,387
9 21 672
145

999-99-9999 324 409


35.5
SCALES
 Represents a composite measure of a variable
 Series of items arranged according to value for the purpose of
quantification
 Provides a range of values that correspond to different
characteristics or amounts of a characteristic exhibited in
observing a concept.
 Scales come in four different levels: Nominal, Ordinal, Interval,
and Ratio
NOMINAL SCALE

Indicates a
difference
ORDINAL SCALE

Indicates a
difference
Indicates the
direction of the
distance (e.g. more
than or less than)
INTERVAL SCALE

Indicates a
difference
Indicates the
direction of the
distance (e.g. more
than or less than)
32 f 0c
Indicates the amount
of the difference (in
equal intervals)
RATIO SCALE

Indicates a
difference
Indicates the
direction of the
distance (e.g. more
than or less than)
Indicates the amount
of the difference (in
equal intervals)
Indicates an absolute
zero
DISCUSSION/TEST: IDENTIFY THE SCALE

Arbitron
Sammy Sosa
Rating# 21
Salary
Prices on the Stock Market
Satisfaction
Gender: Maleon = a1 1-7
or Female
Likert Scale
=2
How
Professorial
many times
rank:respondents
Asst. = 1, Assoc.
return=to
2, a Full
website
=3
Decibel
Number level
of Newspapers
of a speakersold each day
Weight
Amountof ofpaper
time a subject watches a television
program
THINGS ARE NOT ALWAYS WHAT THEY
SEEM TO BE…
Radio
 Does Stations
it show a difference?
Does it show the direction of difference?
Is the difference measured in equal intervals?
Does the measure have an absolute zero?
OPERATIONAL DEFINITIONS: CLASSROOM
PROJECT
 Provide operational definitions for the following:
 Artistic quality
 Objectionable song lyrics
 Writing quality
 Sexual content
 Critical Thinking
Break
Return at 10:00
a.m
TWO SETS OF SCORES…

Group 1 Group 2
100, 100 91, 85
99, 98 81, 79
88, 77 78, 77
72, 68 73, 75
67, 52 72, 70
43, 42 65, 60

How can we analyze these numbers?


CHOOSING ONE OF THE GROUPS…
DESCRIPTIVE STATISTICS
Frequency Distribution Frequency Distribution
Distribution of
Scores Frequency Grouped in Intervals
Responses (N = 12)
Scores Frequency
100, 100 100 2 (N = 12)
99, 98 99 1 40 - 59 3
88, 77 98 1
60 - 79 4
88 1
72, 68 80 - 100 5
77 1
67, 52
72 1
43, 42 68 1
Pie Chart

67 1
52 1
43 1
42 1
40-59 60-79 80-100
Cumulative Cumulative
Scores Frequency Percentage Frequency Percentage
Frequency Distribution 100 2 8.33% 2 8.33%
with Columns for 99 1 4.17% 3 12.50%
Percentage, Cumulative 98 1 4.17% 4 16.67%
Frequency, and 91 1 4.17% 5 20.83%
Cumulative Percentage 88 1 4.17% 6 25.00%
85 1 4.17% 7 29.17%
81 1 4.17% 8 33.33%
79 1 4.17% 9 37.50%
78 1 4.17% 10 41.67%
77 2 8.33% 12 50.00%
75 1 4.17% 13 54.17%
73 1 4.17% 14 58.33%
72 2 8.33% 16 66.67%
70 1 4.17% 17 70.83%
68 1 4.17% 18 75.00%
67 1 4.17% 19 79.17%
65 1 4.17% 20 83.33%
60 1 4.17% 21 87.50%
52 1 4.17% 22 91.67%
43 1 4.17% 21 87.50%
42 1 4.17% 24 100.00%
N= 24 100.00%   
CREATING A HISTOGRAM (BAR CHART)

Histogram (n=100)
14

12

10

8
Frequency

0
42 43 52 60 65 67 68 70 72 73 75 77 78 79 81 85 88 91 98 99 100

Scores
CREATING A FREQUENCY POLYGON

Frequency Polygon
14

12

10

8
Frequency

0
42 43 52 60 65 67 68 70 72 73 75 77 78 79 81 85 88 91 98 99 100

Scores
NORMAL DISTRIBUTION

68%

95%
95%
99% 99%
THE BELL CURVE

.01 .01

Significant Significant

Mean=70
CENTRAL LIMIT THEOREM

 In probability theory, the central limit theorem says that,


under certain conditions, the sum of many independent
identically-distributed random variables, when scaled
appropriately, converges in distribution to a standard normal
distribution.
CENTRAL TENDENCY

 These statistics answer the question: What is a typical score?


 The statistics provide information about the grouping of the
numbers in a distribution by giving a single number that
characterizes the entire distribution.
 Exactly what constitutes a “typical” score depends on the
level of measurement and how the data will be used.
 For every distribution, three characteristic numbers can be
identified:
 Mode
 Median
 Mean
MEASURES OF CENTRAL
TENDENCY

•Mean - arithmetic average


– µ, Population; x , sample
•Median - midpoint of the
distribution
•Mode - the value that occurs most
often
MODE EXAMPLE
Find the score that occurs most frequently
98
88
81
74
72 Mode = 72
72
70
69
65
52
MEDIAN EXAMPLE
Arrange in descending order and find the midpoint

Odd Number (N = 9) Even Number (N = 10)


98 98
88 88
81 81
74 74
72 Midpoint = 72 72 Midpoint =
70 71 (72+71)/2
69 70 = 71.5
65 69
52 65
52
DIFFERENT MEANS
Arithmetic Mean - the sum of all of the list
divided by the number of items in the list

a1  a2  a3  a4  ...  an
a
n
ARITHMETIC MEAN EXAMPLE

98
88
81
74
72
72 741\10 = 74.1
70
69
65
52

741
NORMAL DISTRIBUTION

68%

95%
95%
99% 99%
FREQUENCY POLYGON OF TEST SCORE
DATA

Frequency Polygon
14

12

10

8
Frequency

0
42 43 52 60 65 67 68 70 72 73 75 77 78 79 81 85 88 91 98 99 100

Scores
SKEWNESS

 Refers to the concentration of scores around a particular


point on the x-axis.
 If this concentration lies toward the low end of the scale, with
the tail of the curve trailing off to the right, the curve is called
a right skew.
 If the tail of the curve trails off to the left, it is a left skew.

Left-Skewed Distribution
12
10
8
Frequency

6
4
2
0
42 52 65 68 72 75 78 81 88 98 10
0

Scores
SKEWNESS

 Skewness can occur when the frequency of just one score is


clustered away from the mean.

Frequency Polygon
14
12
10
8
Frequency

6
4
2
0
42 52 65 68 72 75 78 81 88 98 10
0

Scores
NORMAL DISTRIBUTION

68%

95%
95%
99% 99%

Mode = Median = Mean


WHEN THE DISTRIBUTION MAY NOT BE
NORMAL

Salary Sample Data


Mode
9 = 45K
8
Average = 62K
7

5
Frequency

1
Median = 56K
0
25 27 29 32 35 38 43 45 48 51 54 56 59 60 62 65 68 71 75 78 85 88 91 95 98 99 100 150 175

Annual Salary in Thousands of Dollars


MEASURES OF DISPERSION
OR SPREAD
Range
Variance
Standard deviation
THE RANGE
AS A MEASURE OF SPREAD
 The range is the distance between the smallest and the
largest value in the set.

 Range = largest value – smallest value

Group 1 Group 2
100, 100 91, 85
99, 98 81, 79
88, 77 78, 77
72, 68 73, 75
67, 52 72, 70
43, 42 65, 60
Range G1: 100 – 42 = 58 Range G2: 91 – 60 = 31
POPULATION VARIANCE

2
2   X  X )
S  i
N
SAMPLE VARIANCE

2
2   X  X )
s  i
n 1
VARIANCE
 A method of describing variation in a set of scores
 The higher the variance, the greater the variability and/or
spread of scores
VARIANCE EXAMPLE

X X X-X X –X2
98 - 74.1 = 23.90 = 571.21 Population Variance (N)
88 - 74.1 = 13.90 = 193.21
81 - 74.1 = 6.90 = 47.61 1,434.90 \ 10 = 143.49
74 - 74.1 = -0.10 = 0.01
72 - 74.1 = -2.10 = 4.41
72 - 74.1 = -2.10 = 4.41
70 - 74.1 = -4.10 = 16.81 Sample Variance (n-1)
69 - 74.1 = -5.10 = 26.01
65 - 74.1 = -9.10 = 82.81 1,434.90 \ 9 = 159.43
52 - 74.1 = -22.10 = 488.41
Mean = 74.1 1,434.90
USES OF THE VARIANCE

 The variance is used in many higher-order calculations


including:
 T-test
 Analysis of Variance (ANOVA)
 Regression
 A variance value of zero indicates that all values within a set
of numbers are identical
 All variances that are non-zero will be positive numbers. A
large variance indicates that numbers in the set are far from
the mean and each other, while a small variance indicates the
opposite.
STANDARD DEVIATION
 Another method of describing variation in a set of scores
 The higher the standard deviation, the greater the variability
and/or spread of scores
SAMPLE STANDARD DEVIATION

s 
 Xi X 2 
n 1
STANDARD DEVIATION
EXAMPLE
Population STD
X X X-X X –X2
1,434.90 \ 10 = 143.49
98 - 74.1 = 23.90 = 571.21
88 - 74.1 = 13.90 = 193.21
(SQRT) 143.49 = 11.98
81 - 74.1 = 6.90 = 47.61
74 - 74.1 = -0.10 = 0.01
72 - 74.1 = -2.10 = 4.41
72 - 74.1 = -2.10 = 4.41 Sample STD
70 - 74.1 = -4.10 = 16.81
1,434.90 \ 9 = 159.43
69 - 74.1 = -5.10 = 26.01
65 - 74.1 = -9.10 = 82.81
- 74.1 = -22.10 = 488.41 (SQRT) 159.43 = 12.63
52
Mean = 74.1 1,434.90
CLASS ASSIGNMENT

 A survey was given to UNA students to find out how many hours per week
they would listen to a student-run radio station. The sample responses
were separated by gender. Determine the mean, range, variance, and
standard deviation of each group.

Group A (Female) Group B (Male)

15 30
25 15
12 21
7 12
3 26
32 20
17 5
16 24
9 18
24 10
GROUP ONE (FEMALES)

X Mean X-Mean X-Mean2


15 16 -1 1
25 16 9 81
12 16 -4 16
Range = 29 7 16 -9 81
3 16 -13 169 718/9 79.78
32 16 16 256
17 16 1 1 SQRT 8.93
16 16 0 0
9 16 -7 49
24 16 8 64
16    718
GROUP TWO (MALES)

X Mean X-Mean X-Mean2      


30 18 12 144      
15 18 -3 9     
21 18 3 9     
Range = 22 12 18 -6 36      
26 18 8 64      
20 18 2 4  535/9 59.44
5 18 -13 169      
24 18 6 36   SQRT 7.71
18 18 0 0     
10 18 -8 64      
18    535      
RESULTS

Radio Listening Results


Group Average Range Variance S
Females 16 29 79.78 8.93
Males 18 22 59.44 7.71
STANDARD DEVIATION ON BELL
CURVE
.01 .01
What if S = 4?

Significant Significant

58 62 66 Mean=70 74 78 82

-3 -2 -1 0 1 2 3
How Variability and Standard Deviation Work…

Class A Class B

100, 100 91, 85


99, 98 81, 79
88, 77 78, 77
72, 68 73, 75
67, 52 72, 70
43, 42 65, 60

Mean
Mean = 75.5 Mean = 75.5
STD = 21.93 STD = 8.42
HOW DO WE USE THIS STUFF?
 The type of data determines what kind of measures you can
use
 Higher order data can be used with higher order statistics
WHEN SCORES DON’T COMPARE

 A student takes the ACT test (11-36) and scores a 22…


 The same student takes the SAT (590-1,600) and scores a
750…
 The same student takes the TOFFEL (0-120) and scores a 92…
 How can we tell if the student did better/worse on one score
in relation to the other scores?
 ANSWER: Standardize or Normalize the scores
 HOW: Z-Scores!
Z-SCORES

 In statistics, the standard score is the (signed) number of


standard deviations an observation or datum is above or
below the mean.
 A positive standard score represents a datum above the
mean, while a negative standard score represents a datum
below the mean.
 It is a dimensionless quantity obtained by subtracting the
population mean from an individual raw score and then
dividing the difference by the population standard deviation.
This conversion process is called standardizing or normalizing.
 Standard scores are also called z-values, z-scores, normal
scores, and standardized variables.
Z-SCORE FORMULA

𝑋−𝑋
𝑧=
𝑆

Z-Scores with positive numbers are above the mean while Z-Scores
with negative numbers are below the mean.
Z-SCORES, CONT.

 It is a little awkward in discussing a score or observation to


have to say that it is “2 standard deviations above the mean”
or “1.5 standard deviations below the mean.”
 To make it a little easier to pinpoint the location of a score in
any distribution, the z-score was developed.
 The z-score is simply a way of telling how far a score is from
the mean in standard deviation units.
CALCULATING THE Z-SCORE

 If the observed value (individual score) = 9; the mean = 6; and


the standard deviation = 2.68:
Z-SCORES, CONT.

 A z-score may also be used to find the location of a score that


is a normally distributed variable.
 Using an example of a population of IQ test scores where the
individual score = 80; population mean = 100; and the
population standard deviation = 16…

𝑋 −𝜇 80 −100 − 20
𝑧= = = =−1.25
𝜕 16 16
COMPARING Z-SCORES

 Z-scores allow the researcher to make comparisons between


different distributions.

Mathematics Natural Science English


µ = 75 µ = 103 µ = 52
σ=6 σ = 14 σ=4
X = 78 X = 115 X = 57

𝑋 −𝜇 78 − 75 3
Mathematics 𝑧= = = =0.5
𝜎 6 6
115 − 103 12
Natural Science 𝑧= = =0.86
14 14
57 − 52 5
English 𝑧= = =1.25
4 4
AREA UNDER THE NORMAL CURVE

50% 50%

34.1% 34.1%

13.5% 13.5%
2.2% 2.2%

68.2%

95.2%

99.6%
AREA UNDER THE NORMAL CURVE

 TV viewing is normally distributed with a mean of 2 hours per


day and standard deviation of .05. What proportion of the
population watches between 2 and 2.5 hours of TV?
50% 50%

34.1% 34.1%

13.5% 13.5%
2.2% 2.2%

0 1
2− 2 2.5 − 2
=0 =1
.5 .5
Answer = 34%
AREA UNDER THE NORMAL CURVE

 How many watches more than 3 hours per day?

50% 50%

34.1% 34.1%

13.5% 13.5%
2.2% 2.2%

2
3− 2
=2
.5
Answer = 2.2%
AREA UNDER THE NORMAL CURVE

 Go to z-score table on-line


 Assume the z-score of a normally distributed variable is 1.79
 First find the row with 1.7, then go to the column of .09
(second decimal place in z).
 At the intersection of the 1.7 row and the .09 column is the
number .4633.
 Therefore, the area between the mean of the curve (midpoint)
and a z-score of 1.79, is .4633 or approximately 46%
FINAL EXAMPLE

 What is the distance from the midpoint of a curve to the z-


score of -1.32?
 Find the row 1.3
 Then find the column .02
 At the intersection of the row 1.3 and the column of .02
is .4066.
 The distance from the midpoint of a curve to the z-score of -
1.32 is 40.66%
 No matter if the z-score is negative or positive, the area is
always positive.
THE NORMAL CURVE

50% 50%

34.1% 34.1%

13.5% 13.5%
2.2% 2.2%
INTERPRETATION

 Interpretation
 The process of drawing inferences from the analysis results.
 Inferences drawn from interpretations lead to managerial
implications and decisions.
 From a management perspective, the qualitative meaning of the data
and their managerial implications are an important aspect of the
interpretation.
INFERENTIAL STATISTICS PROVIDE TWO
ENVIRONMENTS:
 Test for Difference – To test whether a significant
difference exists between groups
 Tests for relationship – To test whether a significant
relationship exist between a dependent (Y) and
independent (X) variable/s
 Relationship may also be predictive
HYPOTHESIS TESTING USING BASIC
STATISTICS
 Univariate Statistical Analysis
 Tests of hypotheses involving only one variable.
 Bivariate Statistical Analysis
 Tests of hypotheses involving two variables.
 Multivariate Statistical Analysis
 Statistical analysis involving three or more variables or sets of
variables.
HYPOTHESIS TESTING PROCEDURE

 Process
 The specifically stated hypothesis is derived from the research
objectives.
 A sample is obtained and the relevant variable is measured.
 The measured sample value is compared to the value either stated
explicitly or implied in the hypothesis.
 If the value is consistent with the hypothesis, the hypothesis is supported.
 If the value is not consistent with the hypothesis, the hypothesis is not
supported.
HYPOTHESIS TESTING PROCEDURE,
CONT.
 H 0 – Null Hypothesis
 “There is no significant difference/relationship between groups”
 H a – Alternative Hypothesis
 “There is a significant difference/relationship between groups”
 Always state your Hypothesis/es in the Null form
 The object of the research is to either reject or accept the
Null Hypothesis/es
SIGNIFICANCE LEVELS AND P-VALUES

 Significance Level
 A critical probability associated with a statistical hypothesis test that
indicates how likely an inference supporting a difference between an
observed value and some statistical expectation is true.
 The acceptable level of Type I error.
 p-value
 Probability value, or the observed or computed significance level.
 p-values are compared to significance levels to test hypotheses.
Lunch
Return at 1:00 p.m.
EXPERIMENTAL RESEARCH: WHAT
HAPPENS?

An hypothesis (educated guess) and then tested. Possible outcomes:

Something Will
Something Not
Not Happen
Will Happen
It Does Not
It Happens
Happen

Something Will
Something Will
Happen
Happen
It Does Not
It Happens
Happen
TYPE I AND TYPE II ERRORS

 Type I Error
 An error caused by rejecting the null hypothesis when it should be
accepted (false positive).
 Has a probability of alpha (α).
 Practically, a Type I error occurs when the researcher concludes that
a relationship or difference exists in the population when in reality it
does not exist.
 “There really are no monsters under the bed.”
TYPE I AND TYPE II ERRORS (CONT’D)

 Type II Error
 An error caused by failing to reject the null hypothesis when the
hypothesis should be rejected (false negative).
 Has a probability of beta (β).
 Practically, a Type II error occurs when a researcher concludes that
no relationship or difference exists when in fact one does exist.
 “There really are monsters under the bed.”
TYPE I AND II ERRORS AND FIRE
ALARMS?

FIRE NO FIRE

NO ALARM TYPE I NO ERROR

Alarm NO ERROR TYPE II

H0 is H0 is
False True

ACCEPT H0 TYPE I NO ERROR

REJECT H0 NO ERROR TYPE II


TYPE I AND TYPE II ERRORS -
SENSITIVITY

Not Sensitive Sensitive

TYPE I TYPE II
NORMAL DISTRIBUTION

.05 .05

.01 .01
68%

95%
95%
99% 99%
RECAPITULATION OF THE RESEARCH
PROCESS
Collect Data
Run Descriptive Statistics
Develop Null Hypothesis/es
Determine the Type of Data
Determine the Type of Test/s (based on type of data)
If test produces a significant p-value, REJECT the Null
Hypothesis. If the test does not produce a significant
p-value, ACCEPT the Null Hypothesis.
Remember that, due to error, statistical tests only
support hypotheses and can NOT prove a phenomenon
DATA TYPE V. STATISTICS USED

Data Type Statistics Used


Nominal Frequency, percentages, modes
Ordinal Frequency, percentages, modes,
median, range, percentile, ranking
Interval Frequency, percentages, modes,
median, range, percentile, ranking
average, variance, SD, t-tests, ANOVAs,
Pearson Rs, regression
Ratio Frequency, percentages, modes,
median, range, percentile, ranking
average, variance, SD, t-tests, ratios,
ANOVAs, Pearson Rs, regression
PEARSON R CORRELATION COEFFICIENT

X Y

1 4
3 6
5 10
5 12
1 13
2 3
4 3
6 8
PEARSON R CORRELATION COEFFICIENT

A measure of how well a linear equation describes the


relation between two variables X and Y measured on the
same object
X Y y xy
1 4 -3 -5 15 9 25
3 6 -1 -3 3 1 9
5 10 1 1 1 1 1
5 12 1 3 3 1 9
1 13 2 4 8 4 16
Total 20 45 0 0 30 16 60
Mean 4 9 0 0 6
CALCULATION OF PEARSON R

𝑟=
∑ 𝑥𝑦
√∑ 𝑥 ∑ 𝑦
2 2
ALTERNATIVE FORMULA

∑ 𝑥∑ 𝑦
∑ 𝑥𝑦 −
𝑁
𝑟=

√ (∑ 𝑥 )
√ (∑ 𝑌 )
2 2

∑𝑥 2

𝑁
∑𝑌 2

𝑁
HOW CAN R’S BE USED?

Y Y Y

R = 1.00 R = .18 R = .85

X X X

Y
R’s of 1.00 or -1.00 are perfect correlations

The closer R comes to 1, the more related the


X and Y scores are to each other

R-Squared is an important statistic that indicates


R = -.92 the variance of Y that is attributed to by the
variance of X (.04, .73)
X
CONCEPT OF DEGREES OF FREEDOM

Choosing Classes for Academic Program

Class I
Class D Class M
Class G
Class L
Class A Class F

Class J Class B Class P


Class N
Class K
Class H Class C
Class E Class O
16 Classes to Graduate
DEGREES OF FREEDOM

The number of values in a study that are free to vary.


A data set contains a number of observations, say, n. They
constitute n individual pieces of information. These pieces of
information can be used either to estimate parameters or
variability. In general, each item being estimated costs one
degree of freedom. The remaining degrees of freedom are
used to estimate variability. All we have to do is count
properly.
 A single sample: There are n observations. There's one parameter (the
mean) that needs to be estimated. That leaves n-1 degrees of freedom
for estimating variability.
 Two samples: There are n 1 +n 2 observations. There are two means to be
estimated. That leaves n 1 +n 2 -2 degrees of freedom for estimating
variability.
TESTING FOR SIGNIFICANT DIFFERENCE

Testing for significant difference is a type of


inferential statistic
One may test difference based on any type of data
Determining what type of test to use is based on
what type of data are to be tested.
TESTING DIFFERENCE

Testing difference of Testing difference of


gender to favorite form gender to answers on a
of media Likert scale
Gender: M or F Gender: M or F
Media: Newspaper, Likert Scale: 1, 2, 3, 4,
Radio, TV, Internet 5
Data: Nominal Data: Interval
Test: Chi Square Test: t-test
WHAT IS A NULL HYPOTHESIS?

A type of hypothesis used in statistics that proposes


that no statistical significance exists in a set of given
observations.
The null hypothesis attempts to show that no
variation exists between variables, or that a single
variable is no different than zero.
It is presumed to be true until statistical evidence
nullifies it for an alternative hypothesis.
EXAMPLES

Example 1: Three unrelated groups of people choose


what they believe to be the best color scheme for a
given website.
The null hypothesis is: There is no difference
between color scheme choice and type of group
Example 2: Males and Females rate their level of
satisfaction to a magazine using a 1-5 scale
The null hypothesis is: There is no difference
between satisfaction level and gender
CHI SQUARE

A chi square (X2) statistic is used to investigate


whether distributions of categorical (i.e.
nominal/ordinal) variables differ from one
another.
GENERAL NOTATION FOR A CHI SQUARE
2X2 CONTINGENCY TABLE

Variable 1
Variable 2 Data Type 1 Data Type 2 Totals
Category 1 a b a+b
Category 2 c d c+d
Total a+c b+d a+b+c+d

2
2 ( 𝑎𝑑− 𝑏𝑐 ) ( 𝑎+𝑏+𝑐 +𝑑 )
𝑥 =
( 𝑎+𝑏 )( 𝑐 +𝑑 ) ( 𝑏+𝑑 ) ( 𝑎+𝑐 )
CHI SQUARE STEPS

Collect observed frequency data


Calculate expected frequency data
Determine Degrees of Freedom
Calculate the chi square
If the chi square statistic exceeds the probability or
table value (based upon a p-value of x and n degrees
of freedom) the null hypothesis should be rejected.
TWO QUESTIONS FROM A
QUESTIONNAIRE…
 Do you like the television program? (Yes or No)
 What is your gender? (Male or Female)
GENDER AND CHOICE PREFERENCE

H0: There is no difference between gender and choice

Actual Data
Male Female Total
Like 36 14 50 Row
Column
Total
Total Dislike 30 25 55
Total 66 39 105
Grand
To find the expected frequencies, assume independence of the Total
rows and columns. Multiply the row total to the column total
and divide by grand total

rt * ct 50 * 66
ef  OR  31.43
gt 105
CHI SQUARE

Expected Frequencies
Male Female Total
Like 31.43 18.58 50.01
Dislike 34.58 20.43 55.01
Total 66.01 39.01 105.02

The number of degrees of freedom is calculated for an x-


by-y table as (x-1) (y-1), so in this case (2-1) (2-1) = 1*1 =
1. The degrees of freedom is 1.
CHI SQUARE CALCULATIONS

O E O-E (O-E)2/E
36 31.43 4.57 .67
14 18.58 -4.58 1.13
30 34.58 -4.58 .61
25 20.43 4.57 1.03

Chi square observed statistic = 3.44


CHI SQUARE

Probability Level (alpha)

Df 0.5 0.10 0.05 0.02 0.01 0.001


1 0.455 2.706 3.841 5.412 6.635 10.827
2 1.386 4.605 5.991 7.824 9.210 13.815
3 2.366 6.251 7.815 9.837 11.345 16.268
4 3.357 7.779 9.488 11.668 13.277 18.465
5 4.351 9.236 11.070 13.388 15.086 20.51

Chi Square (Observed statistic) = 3.44


Probability Level (df=1 and .05) = 3.841 (Table Value)
So, Chi Square statistic < Probability Level (Table Value)
Accept Null Hypothesis

Check Critical Value Table for Chi Square Distribution on Page 448 of text
RESULTS OF CHI SQUARE TEST

There is no significant difference between product choice and


gender.
CHI SQUARE TEST FOR INDEPENDENCE

Involves observations greater than 2x2


Same process for the Chi square test
Indicates independence or dependence of three or
more variables…but that is all it tells
TWO QUESTIONS…

 What is your favorite color scheme for the website? (Blue,


Red, or Green)
 There are three groups (Rock music, Country music, jazz
music)
CHI SQUARE

H0: Group is independent of color choice


Actual Data
Row
Blue Red Green Total Total
Rock 11 6 4 21
Jazz 12 7 7 26
Column Grand
Country 7 7 14 28
Total Total
Total 30 20 25 75

To find the expected frequencies, assume independence of


the rows and columns. Multiply the row total to the column
total and divide by grand total

rt * ct 21* 30
ef  OR  8.4
gt 75
CHI SQUARE

Expected Frequencies

Blue Red Green Total


Rock 8.4 5.6 7.0 21
Jazz 10.4 6.9 8.7 26
Country 11.2 7.5 9.3 28
Total 30 20 25 75

The number of degrees of freedom is calculated for an x-by-


y table as (x-1) (y-1), so in this case (3-1) (3-1) = 2*2 = 4.
The degrees of freedom is 4.
CHI SQUARE CALCULATIONS

O E O-E (O-E)2/E
11 8.4 2.6 .805
6 5.6 .4 .029
4 7 3 1.286
12 10.4 1.6 .246
7 6.9 .1 .001
7 8.7 1.7 .332
7 11.2 4.2 1.575
7 7.5 .5 .033
14 9.3 4.7 2.375

Chi Square observed statistic = 6.682


CHI SQUARE CALCULATIONS, CONT.

Probability Level (alpha)

Df 0.5 0.10 0.05 0.02 0.01 0.001


1 0.455 2.706 3.841 5.412 6.635 10.827
2 1.386 4.605 5.991 7.824 9.210 13.815
3 2.366 6.251 7.815 9.837 11.345 16.268
4 3.357 7.779 9.488 11.668 13.277 18.465
5 4.351 9.236 11.070 13.388 15.086 20.51

Chi Square (Observed statistic) = 6.682


Probability Level (df=4 and .05) = 9.488 (Table Value)
So, Chi Square observed statistic < Probability level (table value)
Accept Null Hypothesis
Check Critical Value Table for Chi Square Distribution on page 448
of text
CHI SQUARE TEST RESULTS

There is no significant difference between group and choice,


therefore, group and choice are independent of each other.
WHAT’S THE CONNECTION?

x1  x2
t
S x1  x2
GOSSET, BEER, AND STATISTICS…

William S. Gosset (1876-1937) was a famous


statistician who worked for Guiness. He was a
friend and colleague of Karl Pearson and the two
wrote many statistical papers together. Statistics,
during that time involved very large samples, and
Gosset needed something to test difference
between smaller samples.

Gosset discovered a new statistic and wanted to


write about it. However, Guiness had a bad
experience with publishing when another
academic article caused the beer company to lose William Gosset
some trade secrets.

Because Gosset knew this statistic would be


helpful to all, he published it under the pseudonym
of “Student.”
THE T TEST

x1  x2
t
S x1  x2
x1  Mean for group 1

x2  Mean for group 2

S x1  x2  Pooled, or combined, standard error of difference


between means

The pooled estimate of the standard error is a better


estimate of the standard error than one based of
independent samples.
USES OF THE T TEST

Assesses whether the mean of a group of


scores is statistically different from the
population (One sample t test)
Assesses whether the means of two groups of
scores are statistically different from each
other (Two sample t test)
Cannot be used with more than two samples
(ANOVA)
SAMPLE DATA

Group 1 Group 2
x1  16.5 x2  12.2
S1  2.1 S 2 2.6
n1  21 n2  14

Null Hypothesis
x1  x2
H 0 : 1   2 t
S x1  x2
STEP 1: POOLED ESTIMATE OF THE
STANDARD ERROR

2 2
(n1  1) S  (n2  1) S 1 1
S x1  x2  ( 1
)(  )2
n1  n2  2 n1 n2

S12  Variance of group 1 Group 1 Group 2

S 22  x1 16.5 x2  12.2
Variance of group 2
S1  2.1 S 2  2.6
n1  Sample size of group 1 n1  21 n2  14

n2  Sample size of group 2


STEP 1: CALCULATING THE POOLED
ESTIMATE OF THE STANDARD ERROR

2 2
(n1  1) S  (n2  1) S 1 1
S x1  x2  ( 1 2
)(  )
n1  n2  2 n1 n2

2 2
(20)( 2.1)  (13)( 2.6) 1 1
S x1  x2  ( )(  )
33 21 14

=0.797
STEP 2: CALCULATE THE T-STATISTIC

x1  x2
t
S x1  x2

16.5  12.2 4.3


t   5.395
0.797 0.797
STEP 3: CALCULATE DEGREES OF
FREEDOM

In a test of two means, the degrees of


freedom are calculated: d.f. =n-k
n = total for both groups 1 and 2 (35)
k = number of groups
Therefore, d.f. = 33 (21+14-2)
Go to the tabled values of the t-distribution
on website. See if the observed statistic of
5.395 surpasses the table value on the chart
given 33 d.f. and a .05 significance level
STEP 3: COMPARE CRITICAL VALUE TO
OBSERVED VALUE

Observed statistic= 5.39

Df 0.10 0.05 0.02 0.01


30 1.697 2.042 2.457 2.750
31 1.659 2.040 2.453 2.744
32 1.694 2.037 2.449 2.738
33 1.692 2.035 2.445 2.733
34 1.691 2.032 2.441 2.728

If Observed statistic exceeds Table Value:


Reject H0
SO WHAT DOES REJECTING THE NULL
TELL US?

Group 1 Group 2
x1  16.5 x2  12.2
S1  2.1 S 2 2.6
n1  21 n2  14

Based on the .05 level of statistical significance, Group 1


scored significantly higher than Group 2
Break
Return at 2:30 p.m
ANOVA DEFINITION

 In statistics, analysis of variance (ANOVA) is a collection of


statistical models, and their associated procedures, in which the
observed variance in a particular variable is partitioned into
components attributable to different sources of variation.
 In its simplest form ANOVA provides a statistical test of whether or
not the means of several groups are all equal, and therefore
generalizes t-test to more than two groups.
 Doing multiple two-sample t-tests would result in an increased
chance of committing a type I error. For this reason, ANOVAs are
useful in comparing two, three or more means.
VARIABILITY IS THE KEY TO ANOVA

Between group variability and within group


variability are both components of the total
variability in the combined distributions
When we compute between and within group
variability we partition the total variability into the
two components.
Therefore: Between variability + Within variability =
Total variability
VISUAL OF BETWEEN AND WITHIN GROUP
VARIABILITY
Between Group

Group A Group B Group C


a1 b1 c1
a2 b2 c2
a3 b3 c3
a4 b4 c4
Within
Group . . .
. . .
. . .
ax bx cx
ANOVA HYPOTHESIS TESTING

 Tests hypotheses that involve comparisons of two or more


populations
 The overall ANOVA test will indicate if a difference exists
between any of the groups
 However, the test will not specify which groups are different
 Therefore, the research hypothesis will state that there are no
significant difference between any of the groups

𝐻 0 : 𝜇1 =𝜇2 =𝜇 3
ANOVA ASSUMPTIONS

 Random sampling of the source population (cannot


test)
 Independent measures within each sample, yielding
uncorrelated response residuals (cannot test)
 Homogeneous variance across all the sampled
populations (can test)
 Ratio of the largest to smallest variance (F-ratio)
 Compare F-ratio to the F-Max table
 If F-ratio exceeds table value, variance are not equal
 Response residuals do not deviate from a normal
distribution (can test)
 Run a normal test of data by group
ANOVA COMPUTATIONS TABLE

SS df MF F

Between SS(B) k-1 SS(B) MS(B)


(Model) k-1 MS(W)
Within SS(W) N-k SS(W)
(Error) N-k
Total SS(W)+SS(B) N-1
ANOVA DATA

Group 1 Group 2 Group 3


5 3 1
2 3 0
5 0 1
4 2 2
2 2 1
Σx1=18 Σx2=10 Σx3=5
Σx21=74 Σx22=26 Σx23=7
CALCULATING TOTAL SUM OF SQUARES

𝑆𝑆 𝑇 =∑ 𝑥 T −
2
( ∑ 𝑥𝑇
𝑁𝑇 ) 2

( 33 ) 2
𝑆𝑆 𝑇 =107 −
15
1089
𝑆𝑆 𝑇 =107 − =107 − 72.6=𝟑𝟒 . 𝟒
15
CALCULATING SUM OF SQUARES WITHIN

++

(
𝑆𝑆 𝑤 = 74 −
324
5 )(
+ 26 −
100
5
+ 7− )(
25
5 )
𝑆𝑆 𝑤 =( 74 −64.8 ) + ( 26 −20 ) + ( 7 −5 )
𝑆𝑆 𝑊 =9.2+6 +2=𝟏𝟕 .𝟐
CALCULATING SUM OF SQUARES
BETWEEN

( ∑ 𝑥1 ) 2 ( ∑ 𝑥 2 ) 2 ( ∑ 𝑥 3 ) 2 ( ∑ 𝑋 𝑇 ) 2
𝑆𝑆 𝐵 = + + −
𝑛1 𝑛2 𝑛3 𝑁𝑇
( 18 ) 2 ( 10 ) 2 ( 5 ) 2 ( 33 ) 2
𝑆𝑆 𝐵 = + + −
5 5 5 15
324 100 25 1089
𝑆𝑆 𝐵 = + + −
5 5 5 15
𝑆𝑆 𝐵 =64.8+20 +5− 72.6=𝟏𝟕 . 𝟐
COMPLETE THE ANOVA TABLE

SS df MF F

Between SS(B) 17.2 k-1 SS(B) MS(B) 6


(Model) 2 k-1 MS(W)
8.6
Within SS(W) 17.2 N-k SS(W)
(Error) 12 N-k
1.43
Total SS(W)+SS(B) 34.4 N-1
14

If the F statistic is higher than the F probability table, reject the null
hypothesis
YOU ARE NOT DONE YET!!!

 If the ANOVA test determines a difference exists, it


will not indicate where the difference is located
 You must run a follow-up test to determine where the
differences may be

G1 compared to G2
G1 compared to G3
G2 compared to G3
RUNNING THE TUKEY TEST

 The "Honestly Significantly Different" (HSD) test proposed


by the statistician John Tukey is based on what is called
the "studentized range distribution.“
 To test all pairwise comparisons among means using the
Tukey HSD, compute t for each pair of means using the
formula:

𝑀 𝑖− 𝑀 𝑗
𝑡 𝑠=

√ 𝑀𝑆𝐸
𝑛h
Where Mi – Mj is the difference ith and jth means, MSE
is the Mean Square Error, and nh is the harmonic mean
of the sample sizes of groups i and j.
RESULTS OF THE ANOVA AND FOLLOW-
UP TESTS
If the F-statistic is significant, then the ANOVA
indicates a significant difference
The follow-up test will indicate where the differences
are
You may now state that you reject the null
hypothesis and indicate which groups were
significantly different from each other
REGRESSION ANALYSIS

 The description of the nature of the relationship between two


or more variables
 It is concerned with the problem of describing or estimating
the value of the dependent variable on the basis of one or
more independent variables.
REGRESSION ANALYSIS

Around the turn of the century, geneticist Francis Galton discovered a


phenomenon called Regression Toward The Mean. Seeking laws of
inheritance, he found that sons’ heights tended to regress toward the
mean height of the population, compared to their fathers’ heights. Tall
fathers tended to have somewhat shorter sons, and vice versa.

x
PREDICTIVE VERSUS EXPLANATORY
REGRESSION ANALYSIS
 Prediction – to develop a model to predict future values of a
response variable (Y) based on its relationships with predictor
variables (X’s)
 Explanatory Analysis – to develop an understanding of the
relationships between response variable and predictor
variables
PROBLEM STATEMENT

A regression model will be used to try to explain the


relationship between departmental budget
allocations and those variables that could contribute
to the variance in these allocations.

Bud . Alloc. x1 , x2 , x3  xi 


SIMPLE REGRESSION MODEL

( 𝑦 ) =𝑎 +𝑏𝑥
𝑺𝒍𝒐𝒑𝒆 ( 𝒃 )=(𝑁 Σ 𝑋𝑌 − ( Σ 𝑋 ) ( Σ 𝑌 ))¿/(𝑁 Σ 𝑋 2− ( Σ 𝑋 ) 2)

𝑰𝒏𝒕𝒆𝒓𝒄𝒆𝒑𝒕 ( 𝒂 ) =(Σ 𝑌 − 𝑏 ( Σ 𝑋 ) )/ 𝑁

Where:

y = Dependent Variable Y = Second Score


x = Independent Variable ΣXY = Sum of the product of 1st & 2nd scores
b = Slope of Regression Line ΣX = Sum of First Scores
a = Intercept point of line ΣY = Sum of Second Scores
N = Number of values ΣX2 = Sum of squared First Scores
X = First Score
SIMPLE REGRESSION MODEL

Predicted Values Residuals

r  Y  Yˆ
Slope (b) i i i
Actual Values

Intercept (a) x
SIMPLE VS. MULTIPLE REGRESSION

Simple: Y = a + bx

Multiple: Y = a + b1X1 + b2 X2 + b3X3…+biXi


MULTIPLE REGRESSION MODEL

X1

X2

You might also like