U4-1
Experimental Design
We have seen in several examples that an observational
study does not allow us to establish the cause of an
observed difference in the value of some response
variable for different groups.
Recall that an observational study simply measures
variables on an individual, while an experiment
deliberately imposes some treatment on an individual,
in order to observe their response to the treatment.
U4-2
Example
Recall the coffee and sleep example:
Does the caffeine in coffee really help keep you
awake? Researchers interviewed 300 adults and
asked them how many cups of coffee they drink on
an average day, as well as how many hours of sleep
they get at night.
U4-3
Example
We reasoned that, even if those who drink more
coffee get less sleep, we cannot say that coffee is the
cause. Maybe a subject’s high stress level is what
causes him or her to sleep less. We don’t know!
Stress level is said to be confounded with the amount
of coffee a person drinks.
Two variables (explanatory or lurking) are said to be
confounded when their effects on the response cannot
be separated.
U4-4
Example
Confounding is most often the result of an
observational study.
If we really want to know whether drinking more
coffee causes a person to sleep less, we have to
perform a proper experiment.
U4-5
Experimental Design
We have to pay careful attention to the way an
experiment is designed in order for it to serve its
intended purpose.
We want to see how nature responds to a change, and
by doing an experiment, we are actually imposing
that change, in a way that ensures the results we
observe are a response to that change, and not to
some lurking variables.
U4-6
Experimental Design
In experimental design, we have some additional
vocabulary.
The individuals on which the experiment is performed
are called the experimental units. If the individuals
are people, they are called subjects.
A specific set of experimental conditions applied to the
units is called a treatment.
U4-7
Experimental Design
The purpose of an experiment is to observe the
response of one or more variables to changes in other
variables, and so the distinction between explanatory
and response variables is necessary.
The explanatory variables in an experiment are called
factors, and the different values of the factors are
called factor levels. The combination of factor levels
of all variables actually applied to a unit is called the
treatment.
U4-8
Example
The effectiveness of three laundry detergents is being
compared. Nine white sheets are each stained with
grape juice, motor oil and mustard. Three sheets will
each be washed with Tide detergent, three will be
washed with Cheer and the remaining three will be
washed with Sunlight. The sheets will be randomly
assigned which detergent they will be washed with,
and all of them will be washed in the same washing
machine. After they are washed, the amount of stains
removed will be compared for the three detergents.
U4-9
Example
This is an experiment.
The response variable is stains removed (cleanliness).
The experimental units are the sheets.
There is only one factor – brand of detergent.
There are three levels for the factor – Tide, Cheer and
Sunlight.
Since there is only one factor in this experiment, the
treatments are the same as the factor levels.
U4-10
Example
Nutritionists would like to simultaneously study the
effect of diet and exercise on the weight loss of
overweight adults. They would like to examine the
effect of two different diets (Atkins and Weight
Watchers) and three exercise programs (running,
cycling and swimming). A total of 120 overweight
adults volunteer to participate in the experiment.
U4-11
Example
The response variable in this experiment is weight
loss.
The experimental units are the volunteers.
There are two factors – diet and exercise program.
Diet has two factor levels – Atkins and Weight
Watchers.
Exercise program has three factor levels – running,
swimming and cycling.
U4-12
Example
As such, there are 2 x 3 = 6 treatments, as shown
below:
Exercise Program
Factor
Running Swimming Cycling
Levels
Treatment 1 Treatment 2 Treatment 3
Diet Atkins Atkins/Running Atkins/Swimming Atkins/Cycling
20 adults 20 adults 20 adults
Treatment 4 Treatment 5 Treatment 6
Weight
WW/Running WW/Swimming WW/Cycling
Watchers
20 adults 20 adults 20 adults
U4-13
Experimental Design
These examples illustrate the advantage of conducting
an experiment rather than an observational study. In
the first example, since the sheets are all equally dirty
to start out with, and since they are washed in the same
washing machine, any difference in the amount of
stains removed must be due to the detergent used.
U4-14
Experimental Design
In the second example, since subjects are randomly
assigned to the treatments, the treatment groups
should be relatively homogeneous with respect to one
another prior to the experiment. Each treatment
group will contain individuals with a variety of
characteristics – some with good diets, some with bad
diets, some who smoke, some who don’t, some male,
some female, etc. Then if there is a significant
difference in weight loss among the groups, we can
conclude it was because of the treatment those
individuals received.
U4-15
Interaction
Another advantage of experiments is that we can
simultaneously examine the effect of several
variables on the response variable. This will enable
us to examine any interaction that might be present
among the factors.
U4-16
Interaction
Suppose an individual suffers from both anxiety and
high blood pressure. If we conducted separate
experiments, we might find which medication is the
most effective at treating anxiety and which is most
effective at treating high blood pressure.
However, when taken together, the medications might
interact and cause an adverse reaction. This is why it
is important to examine the effects of both factors
simultaneously in one experiment.
U4-17
Experimental Design
Note that it is usually not possible to select the
experimental units randomly. This is especially true
in the case of human subjects. They must volunteer
to participate in the experiment.
Even though they are not selected randomly, what is
important is that we can view them as representative
of the population they come from. In other words,
the fact that they are volunteers shouldn’t make them
any more or less likely to respond to a treatment.
U4-18
Ethics
There are many ethical issues surrounding the design
and implementation of experiments.
For example, a subject must be aware that he or she is
taking part in an experiment (informed consent), and
individual results must be confidential.
U4-19
Experimental Design
The detergent example is the simplest case of an
experiment. We get the units, apply the treatments,
and observe the results.
We always try to control the environment to eliminate
the effect of any potential lurking variables.
The comparison of treatments is a leading principle
in experimental design. For a comparison to be
legitimate, the groups receiving each treatment must
be similar with respect to all other variables.
U4-20
Experimental Design
How can we ensure that treatment groups are similar
with respect to one another?
We assign the individuals to the treatments randomly!
Groups formed by randomization don’t depend on
any characteristics of the units. Our goal is for the
only distinction between treatment groups to be the
treatment they are receiving.
U4-21
Completely Randomized Design
A special type of experiment is a design in which all
units are randomly assigned to receive the various
treatments. We call such an experiment a completely
randomized design (CRD). Both examples we have
seen so far have been CRD’s.
U4-22
Experimental Design
We can draw a diagram to illustrate the design of an
experiment:
Group 1 Treatment 1
3 sheets Tide
Random Group 2 Treatment 2 Compare
9 sheets
Allocation 3 sheets Cheer Stains Removed
Group 3 Treatment 3
3 sheets Sunlight
U4-23
Experimental Design
Group 1 Treatment 1
20 adults Atkins/Running
Group 2 Treatment 2
20 adults Atkins/Swimming
Group 3 Treatment 3
120 20 adults Atkins/Cycling
overweight Random Compare
adults Allocation Weight Loss
Group 4 Treatment 4
20 adults WW/Running
Group 5 Treatment 5
20 adults WW/Swimming
Group 6 Treatment 6
20 adults Atkins/Cycling
U4-24
Example
Patients who have had back pain for over a year go to
see a chiropractor for spine manipulation to ease their
pain. After one week of treatment, 86% of patients
report that their pain is either gone or in decline.
Does this mean that the spine manipulation has cured
their pain?
No – we don’t know!
U4-25
Placebo Effect
We can’t tell if the elimination of pain is because
something helpful is being done or because the
patients perceive something helpful is being done.
This is called the placebo effect.
A placebo is a dummy treatment that is known to
have no physical effect. It may, however, have
beneficial psychological effects.
U4-26
Control Group
A proper way to reach a conclusion in this case would
be to randomly divide the patients into two groups.
One group would receive proper spine manipulation
techniques, while the others may simply receive some
sort of back manipulation that is known to be
ineffective.
This is known as a control group, one for which no
treatment (or a “fake” treatment) is received.
U4-27
Control Group
We use control groups in our experiments to eliminate
the possibility that an observed change is due to
something other than the factors being studied.
In this example, it may turn out that a high percentage
of patients receiving the ineffective back manipulation
report a decrease in pain. If, however, the percentage
is much higher for the treatment group than the
control group, we can conclude that it was in fact the
spine manipulation that caused a decrease in pain.
U4-28
Example
A group of patients suffering from an anxiety disorder
is prescribed a new pill which they take for one year.
At the end of the year, most of the patients report a
decrease in anxiety.
This may be because the pill is working, or it may be
because other things in their life have become less
stressful. It may also be because they think the pill
should be helping, and so they perceive it to be doing
so.
U4-29
Example
We may want to randomly divide our original sample
into two and give one of the groups the actual
medication and another group a placebo (a “sugar
pill”). We compare stress levels for the two groups
after one year and if the pill group’s stress has
decreased significantly more than that of the placebo
group’s, we can say the pill is working. All other
potential lurking variables have been eliminated and
the only systematic difference between the groups is
the type of medication.
U4-30
Example
Suppose you are taking the “Pepsi Challenge”. A
Pepsi employee places two cups on a table, one
containing Coke and the other Pepsi. The labels for
each cup are behind the cups, but are hidden from
view so as not to bias your response. You taste both
colas and select which you prefer.
U4-31
Example
The design of this experiment is fine from the taster’s
point of view. She has no idea which cola is which.
However, the worker at the booth (who works for
Pepsi) does know which cup contains which cola.
As such, his words and actions may in some way
persuade a taster to select Pepsi over Coke.
U4-32
Double-Blind Experiment
One way to avoid this problem is to perform a
double-blind experiment, one in which neither the
subject nor the person administering the treatment
knows which treatment is the one being applied at
any time.
As such, this experiment is made more legitimate in
that the bias introduced by the worker knowing which
cup contains the Pepsi is eliminated. An experiment
is said to be biased when the methodology used
systematically favours certain outcomes.
U4-33
Replication
Just as is the case for any area of statistics, sample
size is an important factor in experimental design. If
an experiment is performed on more units, our results
will be more trustworthy.
Replication is the administration of each treatment to
more than one unit.
U4-34
Experimental Design
We have now seen the three important principles of
experimental design:
§ Randomization of the experimental units to
the various treatments
§ Control of the effects of lurking variables on
the response by comparing several treatments,
one of which may be a control treatment.
§ Replication – the assignment of each treatment
to several individuals to reduce variation in the
results
U4-35
Blocking
A completely randomized design, as we mentioned,
is the simplest type of design. Each treatment is
assigned to a random sample of the available subjects.
We may attempt to further ensure the exclusion of the
effect of lurking variables by using the principle of
blocking.
U4-36
Randomized Block Design
A block is a group of experimental units or subjects
that are similar in ways that are expected to affect the
response to the treatments.
In a randomized block design (RBD), the random
assignment of treatments to units is carried out
separately within each block.
U4-37
Example
A researcher would like to compare the effectiveness of two
popular antacid tablets – Tums and Rolaids. 300 people
suffering from acid reflux disease volunteer to participate in
the study. The researcher believes that responses to the two
medications may differ depending on the severity of the
disease, and so he decides to conduct the experiment
separately for the 180 volunteers with moderate acid reflux
disease, and for the 120 volunteers with severe acid reflux
disease. This is another form of control. It ensures that
some of each type of patient is getting each tablet.
A control group is also included, giving one third of the
volunteers a placebo, i.e. a similar looking tablet known to
have no medical effect.
U4-38
Example
It is again useful to observe a diagram of the design:
Group 1 Treatment 1
60 Moderate Patients Tums
Mild Random Group 2 Treatment 2 Compare
Acid Reflux Assignment 60 Moderate Patients Rolaids Effectiveness
Group 3 Treatment 3
60 Moderate Patients Placebo
300
Volunteers Group 1 Treatment 1
40 Severe Patients Tums
Severe Random Group 2 Treatment 2 Compare
Acid Reflux Assignment 40 Severe Patients Rolaids Effectiveness
Group 3 Treatment 3
40 Severe Patients Placebo
U4-39
Example
The blocking variable in this case is severity of acid
reflux disease.
Notice that the blocks are not formed randomly. We
purposely divided the volunteers into mild and severe
groups prior to the experiment, because we anticipated
they would respond differently to the different tablets.
Also notice that a randomized block design is actually
composed of two (or more) completely randomized
designs, carried out separately within each block.
U4-40
Example
Group 1 Treatment 1
Block 1
60 Mild Patients Tums
Mild Random Group 2 Treatment 2 Compare
Acid Reflux Assignment 60 Mild Patients Rolaids Effectiveness
Group 3 Treatment 3
CRD
60 Mild Patients Placebo
300
Volunteers Group 1 Treatment 1
Block 2
40 Severe Patients Tums
Severe Random Group 2 Treatment 2 Compare
Acid Reflux Assignment 40 Severe Patients Rolaids Effectiveness
Group 3 Treatment 3
CRD 40 Severe Patients Placebo
RBD
U4-41
Example
Note that in the above experiment, the researcher
wanted to compare the two antacid tablets, and to
determine if either of them was better than a placebo.
As such, a control group was included.
However, if he simply wanted to know which of the
tablets was better, it would not be necessary to
include a control group. The control in such an
experiment would come from the comparison made
between the two treatment groups.
U4-42
Randomized Block Design
Suppose in the weight loss example that the
nutritionists believe males and females might respond
differently to different diets and exercise programs.
They may wish to instead use a randomized block
design, where gender is the blocking variable. They
can then conduct the experiment separately for men
and women, which will ensure that several men and
several women receive each treatment.
U4-43
Matched Pairs Design
The most simple and common type of block design is
a matched pairs design. Such designs compare just
two treatments. Each block consists of just two units,
matched as closely as possible.
Suppose we have ten motorcycles and we wish to
compare the performance of two brands of tires.
Instead of putting two of one type of tire on five
motorcycles and two of the other type on the other
five motorcycles, we would like to perform a
matched pairs experiment.
U4-44
Matched Pairs Design
We place one of each brand of tire on each
motorcycle (with the location of each type of tire
being randomly selected). We then compare their
performance for each motorcycle.
We have thus eliminated the potential bias introduced
by using different types of motorcycles (and possibly
different riders) to test different types of tires.
U4-45
Matched Pairs Design
Suppose a professor wanted to compare how well
students do on multiple-choice vs. long-answer
exams.
She could randomly divide the class into two groups
and give one group a multiple-choice exam and the
other group a long-answer exam (both testing the
same material and of equal difficulty). She could
then compare the average grades for the two groups.
U4-46
Matched Pairs Design
But what if she finds that the long-answer group did
significantly better, then discovers that most of the
top students in the class happened to be in that group?
(Randomization will usually prevent this, but it is
possible.)
Now we don’t know whether the marks were higher
because long-answer exams are better, or because the
better students were in this group. The two variables
are confounded.
U4-47
Matched Pairs Design
One solution to prevent this potential problem is to
conduct a matched pairs experiment.
The professor will order the students in her class
according to GPA. For the top two students, one will
be randomly assigned to do a multiple-choice exam
and the other will do a long-answer exam. The same
will be done for the next two students on the list, and
the next two, and so on…
U4-48
Matched Pairs Design
We will now make a separate comparison of marks
for each pair of students.
Now if we see students doing the long-answer exam
scoring consistently higher than students doing the
multiple-choice exam, we can say it must be because
of the type of exam. The reason for this is that all of
our comparisons are now between students of
approximately equal ability.
U4-49
Experimental Design
A summary of a good experimental design:
§ Allocate units to treatments randomly.
§ Control the effect of possible lurking variables by
comparing several treatments or by including
control groups.
§ Use the principle of repetition to apply each
treatment to several individuals.
U4-50
Experimental Design
A summary of a good experimental design (cont’d):
§ Draw a diagram of the design to better explain it.
§ Eliminate bias by conducting a double-blind
experiment if possible.
§ Remember the placebo effect when inferring
causation.
U4-51
Experimental Design
A summary of a good experimental design (cont’d):
§ Use a randomized block design if some units are
similar with respect to the response variable.
§ Matched pairs designs are simple and effective
block designs when comparing just two
treatments.
§ Remember that the purpose of an experiment is
to avoid confounding and to establish causation.