0% found this document useful (0 votes)
179 views8 pages

Matched Pair+Hypothesis+Testing

The document discusses matched-pair hypothesis testing, which involves dependent samples where each observation from one sample is related to an observation from the other sample. It provides an example of a restaurant implementing new workplace policies and evaluating their impact on employee satisfaction scores before and after using a matched-pair test. The test statistic is calculated to determine if the new policies increased satisfaction by at least 2 points, and a 95% confidence interval is constructed around the mean difference in satisfaction scores.

Uploaded by

AmarnathMaiti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
179 views8 pages

Matched Pair+Hypothesis+Testing

The document discusses matched-pair hypothesis testing, which involves dependent samples where each observation from one sample is related to an observation from the other sample. It provides an example of a restaurant implementing new workplace policies and evaluating their impact on employee satisfaction scores before and after using a matched-pair test. The test statistic is calculated to determine if the new policies increased satisfaction by at least 2 points, and a 95% confidence interval is constructed around the mean difference in satisfaction scores.

Uploaded by

AmarnathMaiti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Matched-pair hypothesis testing

We’ve recently been looking at hypothesis testing for the difference of


means when we take two independent samples from one or two
populations. Technically, we say that we have independent samples when
there’s no relationship between the observations we find for each sample.

But sometimes we’ll want to run a hypothesis test on the difference of


means between dependent samples, which are samples for which the
observations from one sample are related to an observation from the
other sample.

Matched-pair tests
When we do hypothesis testing with dependent samples, we often call it a
matched-pair test, because each subject in the second sample matches
with a particular subject in the first sample.

It’s common to run a matched-pair test that compares some new


technique or method to an old one, or looks at a before-and-after change.

For instance, a weight-loss study could define Population 1 as the set of


starting weights for each participant, and Population 2 as the set of ending
weights for each participant. Each participants starting and ending weights
(from Populations 1 and 2, respectively) form a matched-pair for that
individual.

In this example, there’s an advantage to using a matched-pair test, instead


of a difference of means test with independent samples. If we took the

335
independent samples approach, sample 1 could be taken from the
population before the weight loss study begins, and sample 2 could be
taken from the population after the weight loss study ends. This approach
introduces extra variability unnecessarily because we’ll get different
people in both samples.

But if we take the matched-pair approach, we keep the people the same
across both samples, creating a matched-pair of each person’s starting
and ending weights.

In general, hypothesis testing with dependent samples will follow a really


similar process as the one we’ve used for the difference of means with
independent samples, except that we’ll create one variable as the
difference between the two samples, and we’ll perform the hypothesis
test with just this one variable, instead of with two variables.

Let’s work through an example so that we can see how to use dependent
samples in a matched-pair hypothesis test.

Example

A fast food restaurant is implementing new workplace policies with the


goal of increasing employee satisfaction by 2 points on a scale of 1 to 10.
The restaurant surveys 10 employees, asking them both before and after
the policies are enacted to rate their workplace satisfaction on the 1 − 10
scale, and records the results in the table below.

336
Employee 1 2 3 4 5 6 7 8 9 10

Before x1 3 3 5 7 1 0 2 6 6 5

After x2 3 6 9 7 3 5 5 5 9 9

Difference, d 0 3 4 0 2 5 3 -1 3 4

d2 0 9 16 0 4 25 9 1 9 16

Can the restaurant say at 5 % significance that the policies increased


employee satisfaction by 2 points?

The restaurant will define the “before” responses as Population 1, and the
“after” responses as Population 2. The samples are dependent because
it’s reasonable to see how an employee’s “after” response could be
affected by their “before” response.

Then their null and alternative hypotheses will be

H0 : μ2 − μ1 ≤ 2

Ha : μ2 − μ1 > 2

where μ1 is the mean employee satisfaction before the new workplace


policies are implemented, and μ2 is the mean employee satisfaction after
the new workplace policies are implemented. And because μ2 − μ1 is the
difference in employee ratings, the hypothesis statements could also be
written as

H0 : μd ≤ 2

337
Ha : μd > 2

where μd is the mean difference between the two populations.

To find the mean difference, we’ll sum the differences and divide by the
number of matched-pairs in our sample, n = 10.
n
∑i=1 di 0 + 3 + 4 + 0 + 2 + 5 + 3 + (−1) + 3 + 4 23
d¯ = = = = 2.3
n 10 10

So the sample mean tells us that employee satisfaction increases by about


2.3 on a scale of 1 to 10. Then the sample standard deviation is

∑i=1 (di − d¯)2


n

sd =
n−1

To calculate this, we’ll first find


n
(di − d¯)2

i=1

(0 − 2.3)2 + (3 − 2.3)2 + (4 − 2.3)2 + (0 − 2.3)2 + (2 − 2.3)2

+(5 − 2.3)2 + (3 − 2.3)2 + (−1 − 2.3)2 + (3 − 2.3)2 + (4 − 2.3)2

(−2.3)2 + 0.72 + 1.72 + (−2.3)2 + (−0.3)2 + 2.72 + 0.72 + (−3.3)2 + 0.72 + 1.72

5.29 + 0.49 + 2.89 + 5.29 + 0.09 + 7.29 + 0.49 + 10.89 + 0.49 + 2.89

36.1

Then the sample standard deviation is

338
36.1
sd =
9

sd ≈ 4.011

sd ≈ 2.003

Because the population standard deviations are unknown, and/or because


both sample sizes are small, n1, n2 < 30, the test statistic will be

d¯ − μd
t= sd
n

2.3 − 2
t≈ 2.003
10

10
t ≈ 0.3 ⋅
2.003

t ≈ 0.474

and the degrees of freedom are

df = n − 1 = 10 − 1 = 9

At a significance level of 5 % (a confidence level of 95 % ) for an upper-tail


test, and df = 9, the t-table gives 2.262.

339
Upper-tail probability p

df 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.001 0.0005

8 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 4.501 5.041

9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 4.297 4.781

10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 4.144 4.587

50% 60% 70% 80% 90% 95% 98% 99% 99.8% 99.9%

Confidence level C

The restaurant’s t-test statistic t ≈ 0.474 doesn’t meet the threshold


t = 2.262, so the critical value approach tells them that they can’t reject the
null hypothesis, and therefore can’t conclude that the new workplace
policies increased employee satisfaction by 2 points.

Confidence intervals for matched-pair tests


If the restaurant from the previous example had known the population
standard deviation σd, they could have calculated a confidence interval
around the difference d¯ using

(a, b) = d¯ ± zα/2σd¯

σd
(a, b) = d¯ ± zα/2
n

If, instead, the restaurant had an unknown population standard deviation


σd and/or a small sample n < 30, to find a confidence interval around the
difference d¯ they would have used

340
sd
(a, b) = d¯ ± tα/2 with df = n − 1
n

Let’s continue with the previous example in order to calculate the


confidence interval.

Example (cont’d)

Find a 95 % confidence interval around d¯ using the information in the


previous example.

From the previous example, we see that population standard deviation σd


is unknown, and we have a small sample n = 10 < 30, so we’ll calculate the
confidence interval as
sd
(a, b) = d¯ ± tα/2
n

2.003
(a, b) ≈ 2.3 ± 2.262 ⋅
10

(a, b) ≈ 2.3 ± 1.433

So the margin of error is 1.433 and the confidence interval is

(a, b) ≈ (2.3 − 1.433,2.3 + 1.433)

(a, b) ≈ (0.867,3.733)

341
Therefore, there’s a 95 % chance that the change in employee satisfaction
changes between 0.867 points and 3.733 points.

342

You might also like