Unit 1 Tutorials Key Principles of Statistical Methods
Unit 1 Tutorials Key Principles of Statistical Methods
Statistical Methods
INSIDE UNIT 1
Statistics Fundamentals
Statistics Overview
Data
Qualitative and Quantitative Data
Discrete vs. Continuous Data
Sampling
Sampling
Random & Probability Sampling
Simple Random and Systematic Random Sampling
Stratified Random and Cluster Sampling
Multi-Stage Sampling
Experiments
Data
Variables
Question Types
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 1
Accuracy and Precision in Measurements
Absolute Change and Relative Change
Using Percentages in Statistics
Index Number and Reference Value
Evaluating Studies
Bias
Nonresponse and Response Bias
Selection and Deliberate Bias
Convenience & Self-Selected Samples
Random and Systematic Errors
Margin of Error
Statistics Overview
by Sophia
WHAT'S COVERED
This lesson will provide you with an overview of what statistics really is by exploring the difference
between descriptive and inferential statistics. Specifically, this lesson covers:
1. Statistics
2. Types of Statistics
1. Statistics
You might be wondering, what is statistics? Is it some complicated formula? Is it some goofy graph that you
really don't know that much about?
When people refer to statistics, they're usually referring to information called data that's been collected and
synthesized within a statistical study and sometimes presented in a graphical form, like this:
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 2
While the image may be small and difficult to read, you get the idea that a LOT of information can be presented
in the form of a graph.
It can also be presented numerically, such as “The median household income in the United States is $46,326.”
WATCH
STEP BY STEP
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 3
4. Present. Present it in a way that anyone can understand.
Statistics is a neat way to describe a messy world. It's not pretty all the time. But statistics allows us a way to
simplify things down.
TERMS TO KNOW
Statistics
The study of collecting, analyzing, interpreting, and presenting information.
Statistical Study
A way to collect information from individuals.
2. Types of Statistics
When you use descriptive statistics, you are going to analyze what's going on at a particular point and use
statistics to describe the information that you've obtained.
On the other hand, when you use inferential statistics, you are going to use statistics that you've obtained and
make a generalization about the population at large.
IN CONTEXT
Let's say that you read the newspaper this morning and discovered that the average household
income in the United States was reported to be $46,700.
This information didn't come from sampling every household in the United States. It wouldn't be
realistic or feasible to knock on all the doors and speak to all those people. But someone arrived at
this number. So, how did they get it?
Well, a sample was taken, and a generalization was made about the entire United States based on
that sample.
TERMS TO KNOW
Descriptive Statistics
Using only the information at hand to describe the selected group of individuals.
Inferential Statistics
Using the information at hand to make a larger, more general statement about the entire population of
individuals.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 4
SUMMARY
Statistics allows us to synthesize the information we get from the world around us. There are two types
of statistics. Descriptive statistics describe information gathered at a particular point. Inferential
statistics gather information and then makes a generalization or prediction about the population.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
ATTRIBUTIONS
TERMS TO KNOW
Descriptive Statistics
Using only the information at hand to describe the selected group of individuals.
Inferential Statistics
Using the information at hand to make a larger, more general statement about the entire population of
individuals.
Statistical Study
A way to collect information from individuals.
Statistics
The study of collecting, analyzing, interpreting, and presenting information.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 5
Data
by Sophia
WHAT'S COVERED
This lesson will introduce the collection and evaluation of data, including:
1. Defining Data
2. Evaluating Types of Data
3. Gathering Data
1. Defining Data
Data is the pieces of information that we use in order to answer some statistical question. It could be a number
or an attribute.
But ultimately, it's the pieces of information that we use to get a more accurate picture of a scenario. Every
piece of data helps us to get a more accurate description, which begs the question, how do you obtain data?
Where does it come from? Do you just make it up? Where is data?
TERM TO KNOW
Data
Information used in a study to answer a statistical question.
Now, who collects data? Well, a lot of places collect data, such as:
Government organizations
Polling organizations
News sources
Government entities
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 6
Private entities
The vast majority of sources are trustworthy. However, when using available data, it's important to think critically
about what the information is trying to convey. It’s essential to break apart the information and ask yourself
these questions:
Who collected it?
Are they reputable?
Are they trustworthy?
When was it collected?
How was it collected?
Why did they collect it?
But what about less obvious characteristics such as whether or not a source has an agenda? This is a key point.
Having an agenda, whether intentional or not, can introduce what's called bias.
Typically, polling organizations, news organizations, and government entities do their best to gather relevant,
unbiased information. However, it is important to note the goal or agenda of an organization because this can
be a source of bias. Typically, bias occurs unintentionally and reflects subtle aspects of how the data is
collected.
What can you do if no data exists to answer your question? In this case, you can collect data yourself. Gathering
information yourself generates raw data. Raw data requires additional organization and processing before it
can be used.
TERMS TO KNOW
Available Data
Data collected by some other entity—a government organization or private company.
Raw Data
Data that is unorganized, unprocessed, and not summarized. Typically, this is data that is not already
available.
Bias
The systematic favoring of certain outcomes in a study. There are many ways to introduce bias into a
study.
3. Gathering Data
If you choose to collect your own data, you must think critically and ask yourself these questions:
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 7
Collecting data is important because it's the source of statistics. Think about data as the raw means of creating
something useful. If you collect your data well, the statistics are going to be accurate. If you collect your data
poorly, then your data is poor. There's no rescuing that.
BIG IDEA
You can't make useful statistics out of poor data. Thinking critically will help you determine which type of
data should be used for your purposes.
SUMMARY
This tutorial defined data as “information used in a study to answer a statistical question.” We
discussed how to evaluate types of data, available or raw, and the importance of asking yourself
questions focusing on the who, what, why, and how of data in order to help identify bias. When
gathering your own data, it’s important to understand your audience and consider how they will gain
access to all your hard work.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Available Data
Data collected by some other entity—a government organization or private company.
Bias
The systematic favoring of certain outcomes in a study. There are many ways to introduce bias into a
study.
Data
Information used in a study to answer a statistical question.
Raw Data
Data that is unorganized, unprocessed, and not summarized. Typically, this is data that is not already
available.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 8
Qualitative and Quantitative Data
by Sophia
WHAT'S COVERED
In this tutorial, you're going to learn about the difference between qualitative data and quantitative
data, by examining:
1. Qualitative Data
1a. Nominal Measurements
1b. Ordinal Measurements
2. Quantitative Data
3. Qualitative and Quantitative Data in Practice
1. Qualitative Data
Qualitative data is also often called “categorical data”. It is not numerical in the sense that we can do numerical
operations with it, like adding numbers together or finding an average, but rather, it fits in the category.
EXAMPLE Gender: male and female. That's a qualitative variable with two categories.
Letter grades AND zip codes feature numbers, but you wouldn’t necessarily do mathematical equations with
them. You wouldn’t find an average zip code, for instance. The purpose of zip codes is to divide areas into
categories. Hair color is another example of qualitative data because you can group those with black hair
together and put those with blonde hair in another group.
It's important to know that qualitative data can be divided further into two categories:
Nominal measurements
Ordinal measurements
TERM TO KNOW
Qualitative/Categorical Data
Data whose values are the names of categories. These can be numbers, but not the kinds of numbers
with which it makes sense to do any numerical operations.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 9
EXAMPLE Favorite color. The order of the listed categories makes no difference. It doesn't matter if you
put the colors below in the order of the color spectrum or not.
With nominal data, it only makes sense to reference which category has the largest frequency. In this case, let’s
say most people say that green is their favorite color. That is what you would report, and it doesn’t matter that
green is the fourth box from the left.
TERM TO KNOW
EXAMPLE Rating scale. The order of the listed categories is very important because the order is
associated with a type of value. It’s very important that you don’t mix up the order here because the circle
on the farthest left indicates you are feeling no pain.
Pain Scale
❍ ❍ ❍ ❍ ❍ ❍ ❍
No Moderate Worst
Pain Pain Pain
With ordinal data, it’s important to keep the order straight, or rather, in order, to express a spectrum ranging
from lowest to highest, or worst to best. Ratings like that.
TERM TO KNOW
2. Quantitative Data
On the other hand, you have quantitative data. Quantitative data are expressed numerically. It makes sense to
do numerical operations with it, like finding averages or adding them together.
Weight
Commute time to work
Outdoor temperature
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 10
All of these are measured in numbers. It makes sense to find, for instance, averages of these; therefore, you can
do numerical operations with them.
It's important to note that data is displayed differently for qualitative data than with quantitative data. Statistical
operations depending on the type of data that we have.
TERM TO KNOW
Quantitative Data
Data whose values are numbers and it makes sense to do numerical operations.
WATCH
SUMMARY
Data used in statistics falls under one of two broad classifications: categorical, which is called
“qualitative data,” or numerical, which is called “quantitative data.”
Qualitative data branches out even further to either nominal measurements, which means that the
names are important, and ordinal measurements, which means the order is important.
Numerical values must make sense to do numerical operations with them. They are treated differently
when organizing graphical displays and applying statistics to them. You explored several different
situations involving qualitative and quantitative data in practice to determine which situation reflected
which type of data.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Nominal Data
Categorical data with qualities that cannot be ordered or ranked.
Ordinal Data
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 11
Categorical data with qualities that can be ordered or ranked.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 12
Discrete vs. Continuous Data
by Sophia
WHAT'S COVERED
This tutorial will discuss different types of data by contrasting the following types of data:
1. Discrete Data
2. Continuous Data
3. Discrete and Continuous Data in Practice
1. Discrete Data
Usain Bolt is widely considered the greatest sprinter of all time. Let's consider some data on Mr. Bolt:
Gold medals: 8
World records: 3
100-meter time: 9.58 seconds
Weight: 207 lbs
All of these are numerical or quantitative data, but discrete data can only take on certain values within a range.
Examples of discrete data would be the number of gold medals and world records. Those can only take whole
number values. You can't have half of a gold medal. His race times and weight could be any value that we have
the precision to measure.
The number of rail cars in a train and shoe sizes are also examples of discrete data. You can have half-size shoe
sizes, but that's all you can have. You can't have quarter-size shoe sizes, or eighth-size shoe sizes, or 0.01 shoe
sizes. You can't say that you're a size 9 and one eighth. So, there are only certain values that a shoe size can
take—that makes it discrete.
TERM TO KNOW
Discrete Data
Data that can only take so many different values.
2. Continuous Data
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 13
Now, the difference between discrete data and continuous data is that continuous data can take any value
within a range. Some examples of data that are continuous are temperature, commute time, and weight. With all
of these examples, they can take any value within a range. For instance, suppose you're talking about daytime
temperature.
The daytime temperature could be something between 50 and 80 degrees on a summer's day, and it takes on
any value between those values. The same is true with commute time. One day it might take you 30 minutes
and 5 seconds to get to work. The next day it might take you 32 minutes and 17 seconds.
And with weight, one person might weigh 150.75 pounds, and one person might weigh 102.62 pounds. Their
weight can take any value within a spectrum, as opposed to discrete values, which can only take certain values
within a spectrum.
TERM TO KNOW
Continuous Data
Data that can take any value within an interval.
WATCH
TRY IT
You should have said that barometric pressure is continuous because it can take any value within a
certain range, usually somewhere around 30.
Discrete. You can't have half a pair—although I suppose you can have half a pair of shoes if you've lost
one—so you can't have any number of pairs of shoes within a certain range. Typically, it takes only
whole number values.
That's continuous. It could take any length of time from zero seconds all the way to a couple of years.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 14
Question: Number of green M&Ms in a bag?
Answer: Discrete. Typically, again, we're dealing only with whole number values.
SUMMARY
Quantitative data can be broken down into two subcategories. It can be called continuous data, which
can take on any number in a range of values, or it can be called discrete data, which can only take
certain values. Every quantitative data measurement that we get is either going to be continuous or
discrete. This tutorial also put discrete and continuous data in practice to allow for some application!
Good Luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Continuous Data
Data that can take any value within an interval.
Discrete Data
Data that can only take so many different values.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 15
Sampling
by Sophia
WHAT'S COVERED
In this tutorial, you're going to learn all about sampling, focusing on:
1. Population and Census
2. Sample
Typically, we use the population of the United States, the population of the world, or the population of a state to
be the population that we wish to generalize our findings to since examining all members of a population may
not be feasible. This method, examining all members, is called a census. Hopefully, a group of people can
represent the population.
Since the group of people from the United States seems like too big of an example, a smaller example of billiard
balls will be demonstrated. As you see in the image below, the complete set of things in this particular example
are the 15 billiard balls on a pool table.
With a group so small, it's possible to take all of them and define some attribute of them like color, or weight, or
what have you—whether they're striped or solid, there are lots of different ways that you could describe each
billiard ball. And it's easy enough just to take the entire population and examine all of them.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 16
TERMS TO KNOW
Census
Using the entire population to obtain data.
Population
The entire set of individuals from which to sample.
2. Sample
When you think about the United States example, you can see that it's not really always feasible. Suppose your
population is a large group of people, much larger than 15 people. It's kind of a big group, and it might be hard
to get answers from everybody.
What you might choose to do is take a small subset of those individuals and make a sample. In this case,
perhaps seven of these many individuals in the population were chosen. A sample is a subset of the population;
you would obtain data from that subset and leave everyone else out.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 17
From that sample, you would obtain your data and calculate your statistics. The idea is that hopefully, you would
like the sample to be a small version of the population—a microcosm of the population, such that when you
calculate your statistics from the data obtained from the sample, it's about the same as what you would have
gotten if you had measured the population directly. That's what we mean when we say that we want the sample
to be a representative sample of the population.
There are certain ways that you can guarantee that a sample will be representative. One way is to take the
entire population and put them in a hat.
Now again, this is a lot easier with billiard balls than it is with people. But imagine putting all the billiard balls into
the hat.
Let’s say you shake up the hat, and take out a sample of five.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 18
There are certain ways to guarantee that you won't get a representative population. Suppose I specifically
cherry-picked only solid-colored billiard balls. Well, that wouldn't be very representative of the population of 15.
THINK ABOUT IT
Is it possible that when you take that hat and pull out five billiard balls, all five of them are solid? Sure, that's
possible; it's just not all that likely. If you cherry-pick, that's not a good idea because you're getting
something that's specifically not representative of the entire population.
TERMS TO KNOW
Sample/Sampling
A subset of the population. There are many ways to select a sample.
Representative Sample
A sample that accurately reflects the population.
SUMMARY
A census is a way of collecting data that uses everybody; a sample only uses some. To generalize the
findings from the sample to the population at large, the sample has to be representative of your
population at large. Once again, the terms/concepts that we've described in this tutorial are population,
census, sample (noun), and sampling (verb), and the idea that a sample should be representative.
Good luck!
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 19
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Census
Using the entire population to obtain data.
Population
The entire set of individuals from which to sample.
Representative Sample
A sample that accurately reflects the population.
Sample/Sampling
A subset of the population. There are many ways to select a sample.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 20
Random & Probability Sampling
by Sophia
WHAT'S COVERED
This tutorial covers random and probability sampling methods, focusing on:
1. Random Sample
1. Random Sample
The term “random” is used a lot in everyday speech, but what does it mean when it comes to statistics? In
statistics, random refers to something that is unpredictable and does not have a recognizable pattern.
With a random sample, every member of the population has the same chance of getting selected. This is the
best way to get a representative sample. Recall that a representative sample is when the population and the
sample have the same set of relevant characteristics.
If you want a random sample, you will need to select participants in such a way that every member of that
population has an equal chance of being selected for the sample. This is also known as random selection.
You need to come up with a method to achieve a random sample, and you can do that with a probability
sampling plan. This plan must be made first before a random sample can be taken. You can also “weight”
certain people so that they might be more likely to be selected for the sample.
IN CONTEXT
What does a random sample look like in context? Suppose there are 15 billiard balls from a pool table:
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 21
You place them all in a hat, and you shake the hat, and voila, here's a sample of five.
Shake #1
Suppose you place the billiard balls back in the hat and shake the hat for a second time.
Shake #2
This is another sample of five, and it is not that different than the previous example. If you conducted
the same hat trick over and over again, each billiard ball would have an equal chance of being pulled.
Shake #3
What happened here was we got balls 9, 11, 12, 13, and 14—all of which happened to be striped billiard
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 22
balls. No solids. If you only had access to this information, you might be led to believe that all the balls
in the hat were striped, which wouldn't be the case.
This may seem odd, but it can certainly happen even though you selected these randomly—you did a
probability sampling plan. The reason is because this sample of five is just as likely as any other
sample of five to be chosen.
Might you get something that's unrepresentative? Yes. But the vast majority of the time, it will be representative.
TERMS TO KNOW
Random Sample
A sample that has been selected in a manner where every member of the population has some
predetermined chance of being selected for the sample.
Random Selection
The method of obtaining a random sample.
SUMMARY
The best method for selecting a sample that's representative is a random sample and a probability
sampling plan. Now, this won't always get you a representative sample; however, often, you will get one
when you do random samples.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Random Sample
A sample that has been selected in a manner where every member of the population has some
predetermined chance of being selected for the sample.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 23
Random Selection
The method of obtaining a random sample.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 24
Simple Random and Systematic Random
Sampling
by Sophia
WHAT'S COVERED
This lesson will explain how to ensure everyone in the population has an equal chance of participating
in a sample, specifically focusing on:
1. Simple Random Sample
1a. Random Number Generator
1b. Random Number Table
2. Systematic Random Sampling
If you’ve ever experienced a raffle situation, you’ve experienced a simple random sample. What generally
happens at these events is that someone removes tickets from the raffle and puts them into a bucket.
The tickets are mixed up in the bucket, and one ticket is pulled out. The owner of that ticket usually wins some
kind of fantastic prize. Now, being in a simple random sample is pretty much the same thing. The only
difference is that instead of winning the prize, you get to be part of the sample—and that's your prize.
IN CONTEXT
Suppose you take billiard balls from a pool table and put those all into a hat.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 25
Next, shake it up, and take out five billiard balls. Do this for two shakes.
Shake #1 Shake #2
You may have noticed that the solid yellow “1” ball was in both of these first two examples. However, it
doesn't mean it's any more likely to be selected than any of the other balls. It's the same likelihood.
Any sample of five, the first or second sample of five, was an equally likely sample of five.
Shake #3
Now, notice that all five of these were striped billiard balls, with not one solid ball in the bunch. Is that
unusual? Sure, it's kind of unusual to happen.
Unusual samples have an equal likelihood to happen too. Just because they're strange and don't
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 26
happen very often doesn't mean they can't happen. In fact, they have the same likelihood as any other
selection of five.
Therefore, knowing how to take a simple random sample, abbreviated SRS, is important because most
inferences about the population that we do assume that we collected data in this way. So, names in a hat are
fine. In our case, raffle tickets in a bucket, or billiard balls in a hat...that's all fine.
TERM TO KNOW
EXAMPLE Suppose that we want to take a sample of 100 individuals from a population of 2,000 people.
Below you will see some of those individuals lined up, and you can imagine that individuals 10 through
1,995 are somewhere in the middle. Each is assigned a unique number so no one can have the same
number as anybody else.
Using technology such as a website, you can search “random number generator” on the internet, and websites
will come up, or you can use a calculator. This particular model of a calculator is the Texas Instruments
calculator:
“RandInt” indicates a random integer—an integer is a whole number—from 0 to 1. Therefore, it picks either 0 or
1. When you put in the third number, it's asking how many of them you want. In this case, you entered five. Now,
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 27
you don't want numbers between 0 and 1 in this case, and you don't want five of them. You want numbers
between 0 and 2,000, and you want 100 of them. Now, why was 150 written when you only want 100 numbers?
You can’t select one person twice, so repeats must be ignored. It's incredibly likely that if you had just written
100 instead of 150, there would have been at least one repeat in the bunch.
Finally, you're going to select the individuals that correspond to those first 100 different numbers that were
picked.
So, person number 8, and the person that corresponds to 1,119, and the person who corresponds to 1,996 are a
few that are chosen.
Now, notice that the person corresponding to 8 was chosen again—you can see that it’s listed twice in the list.
You're not going to select that person twice because they've already been selected once, so they are crossed
out. This is the reason 150 numbers were created—so that you have room to cross repeats out.
TERM TO KNOW
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 28
consuming than using a random number generator. A random number generator typically goes faster.
Each individual is assigned a unique number, just like the random number generator; however, each member's
number must have the same number of digits.
The same method as the random number generator cannot be used because the number 2,000 has four digits,
and the number 1 only has one digit. All of these must have the same number of digits, so instead of 1, it's 0001.
Instead of 2, it's 0002, and so forth, all the way up to 2,000. A table of random digits can be found in a textbook
or online. Four numbers will be selected at a time because each individual has four numbers.
EXAMPLE Suppose the first four numbers found were 1-9-2-2. That corresponds to someone in the list.
There is someone who is 1,922, so that individual will be selected for the sample. It’s circled in green below
since a person corresponds to that number. The next number found is 3-9-5-0. No one on the list
corresponds to the number 3,950, so it is ignored. The next number, 3-4-0-5, does not correspond to an
individual either, so that is ignored as well.
You'll notice that all numbers circled in red are numbers that are unassigned in our list. This is going to make
this a very cumbersome process. It will go on for a while until 100 individuals are obtained. Will this work? It will
work, but it might take a very long time.
One of the numbers circled in green is 0001. This is the very first person on the list, and it just happens that
person 0001 will be among the sample. This individual will be selected along with everyone else whose four-
digit number was selected.
TERM TO KNOW
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 29
There is one thing to know about systematic sampling right off the bat: It is not inherently random. You have to
be very careful about this. A systematic random sample involves assigning a value, k, to individuals within a
population. Then, you state that every kth individual is chosen, similar to elementary school when you counted
off by 3’s to create teams.
The value of k can be anything. You could choose every second individual, in which case, all the green people
are in, and all these black stick figures are out. Or you could do every third person, where one person is in and
then skip two; then the fourth person is in and skip two. Or you could go every fourth person.
Often, people prefer systematic samples to simple random samples because systematic samples are so much
easier to take. It's easier than getting a whole list of people and assigning everyone a number or putting all the
people's names in a hat. It's easier to take every fifth person or whatever you decide k should be.
HINT
The nice thing about a systematic sample is that it can be tailored to fit your sample size. If you wanted a
sample of 25 from 500 individuals, you could sample every 20th person since 500 divided by 25 equals
20. So, you would obtain your sample of 25 by sampling every 20th person.
IN CONTEXT
Suppose that you have 20 students in a class, and they're in rows, assigned to their desks randomly. If
that were the case, you could count off every fourth student and have five students go up to the
chalkboard to do a homework problem on the chalkboard.
1 2 3 ✘ 5
6 7 ✘ 9 10
11 ✘ 13 14 15
✘ 17 18 19 ✘
So, persons one, two, and three don't have to do it. Person number four heads up to the chalkboard to
work on a problem. Five, six, and seven don't have to do it, but number eight does. You can see the
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 30
checkmarks to indicate the pattern and who needs to go up to the chalkboard.
Adamson
Abbott Acosta Adams Adler
✘
Frye
Anderson Bueller Grey Jones
✘
Morris
McClurg Peterson Pickett Rooney
✘
Ruck Ward
Sara Sheen Stein
✘ ✘
By selecting, say, Adamson, you automatically know who all the rest of the people are going to be.
Since Adler is right next to Adamson, you know that Adler won't get chosen. Nor will Anderson or
Bueller, but Frye will.
If these students were randomly assigned to the seats, picking Adamson would not predetermine who
all the other people were going to be selected for the sample, but having them alphabetized impacts
the random selection process.
TERM TO KNOW
SUMMARY
A simple random sample is the ideal sampling method if your goal is to obtain a representative sample.
Sometimes, with big populations, it's not feasible to assign everyone a number or put everything into a
hat, so other sampling methods may be used. The random number generator is typically used with a
calculator and is a fast way to calculate random “integers” without needing to assign same-number
digits to each individual. The random number table is a more time-consuming method and is generally
used when technology is not available. A systematic random sample can be similarly valid, and it is
much easier to perform. It involves taking every kth individual—however, the population must be
randomly sorted before the systematic selection. Otherwise, it won't be considered random.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 31
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
SUMMARY
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 32
Stratified Random and Cluster Sampling
by Sophia
WHAT'S COVERED
This tutorial will cover the topic of stratified random sampling, which is a random sampling procedure
that subdivides the population into groups. In addition, we will introduce cluster samples. This lesson
will focus on:
1. Stratified Random Samples
2. Cluster Samples
3. Real-World Comparison
For a simple random sample of 42 students, think of ways that 42 students could be chosen, each having an
equal chance of being selected. First, assign each student a unique number 1 to 420 (total number of students).
Once this is done, you could:
Use a random number generator to select 42 numbers, ignoring repeats. The students who corresponded
to those numbers will be surveyed about the school's new, healthy options.
Put the 420 student names in a hat and draw out 42.
Now, is there a way that the study might improve and guarantee an accurate cross section of students between
the grades? After all, freshman might feel differently about the healthy options than seniors, so it will be
important to have individuals from each grade weigh in on the lunch options.
This can be done with a stratified random sample. Stratified random sampling is a method where the
population is subdivided into groups called strata. Strata are groups with homogeneous characteristic(s). They
are separated by the characteristic that we think might affect the overall sample. This is to avoid having too
many of the sample display this one characteristic that may affect the sample.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 33
In the above example, it would look something like this: Since 42 is 10% of the school's population, your survey
should be 10% of each grade.
10% of the freshman class of 100 is 10, so you would want to randomly select 10 individuals from the
freshman class to participate.
10% of the sophomore class of 110 is 11, so you would want to randomly select 11 individuals from the
sophomore class to participate.
10% of the junior class of 120 is 12, so you would want to randomly select 12 individuals from the junior class
to participate.
10% of the senior class of 90 is 9, so you would want to randomly select 9 individuals from the senior class
to participate.
Once the groups are in place, a simple random sample is carried out within each stratum, like putting names in
a hat or assigning everyone a unique number and randomly selecting numbers. You can have as many strata as
you please, but they must be roughly homogeneous.
WATCH
TERMS TO KNOW
Stratum/Strata
The homogeneous groups in a stratified random sample. All individuals in each stratum have something
in common, and we would like to see how that affects the outcome of the sample.
2. Cluster Samples
When using a cluster sample, the population is divided into groups. These groups are called clusters. It’s
important to note that these groups are natural groupings. They don't necessarily have anything in common,
other than, say, geography, typically. Therefore, we're going to take a random sample of clusters instead of a
random sample of individuals.
Each individual in the cluster is going to be part of the sample if we select that cluster. So, unlike the groups in a
stratified random sample, the groups in a cluster sample aren't based on a characteristic or variable. The
individuals in the cluster just happen to be near each other.
IN CONTEXT
Suppose you work at a potato chip company and it’s your job to implement some quality control in the
manufacturing department. Maybe you stand at the start of the assembly line and take a simple
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 34
random sample of individual chips. That would work just fine.
However, it might be easier for you to sample some bags of chips. The bags of chips are clusters. You
would then take a bag of chips off the assembly line and sample every chip in that bag for quality
control. That’s cluster sampling.
Similar to every sampling method, cluster sampling has pros and cons.
It is easier than a simple random sample, and often it doesn't cost as much.
Advantages
Typically, it gives similar results because the clusters are fairly heterogeneous.
Risk that clusters are NOT heterogeneous—perhaps they do have some characteristic other
Disadvantages than just being geographically different from each other that might affect the sample's
findings.
TERMS TO KNOW
Cluster Sample
A sampling method where the population is separated into groups, typically geographically, and a
random selection of clusters is made. Each individual in the cluster becomes part of the sample.
Clusters
Smaller subgroups of the population, not necessarily similar in any way besides all being together in
one place, making the individuals easier to sample together.
3. Real-World Comparison
Suppose a landlord of an apartment complex wants to know whether a new carpet he's considering is
appropriate for all the apartments in the building. Each of the four floors has eight apartments.
THINK ABOUT IT
What would a simple random sample look like? How might a cluster sample be different from a stratified
random sample?
Simple Random Sample: He could randomly select eight apartments from the building.
Stratified Random Sample: He could randomly select two apartments per floor.
Cluster Sample: He could take a spinner like the one shown below and spin it.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 35
Suppose it landed on 3. That means that every apartment on the third floor would receive carpeting. He
doesn't have to have the carpet installers going to all these different rooms on all these different floors. He
can simply instruct everyone to go up to the third floor and install carpet in every room on that floor, which
would be far easier for him and just as cost effective.
But what if all the floors were NOT heterogeneous? What if apartments on the third floor allowed pets? The
carpet might not hold up as well. That’s one of the disadvantages of cluster sampling in action. But typically,
the clusters are fairly representative and very similar to a simple random sample.
SUMMARY
In a stratified random sample, the population is broken down into homogeneous groups called “strata.”
The reason for this is to separate an otherwise homogeneous group that exhibits characteristics that
may misrepresent the population. The idea is to force them into groups and then take a simple random
sample within each of the strata. Cluster sampling, on the other hand, is done by taking naturally
occurring—typically geographically—similar groups and taking a simple random sample of the clusters.
Then, each member of the cluster becomes part of the sample. A couple of advantages of cluster
samples are that they are more cost effective and usually achieve the same results as a simple random
sample. The disadvantage is that sometimes the cluster may not be heterogeneous, as seen in a real-
world comparison involving a landlord and his apartment complex with pets allowed on carpet.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Cluster Sample
A sampling method where the population is separated into groups, typically geographically, and a random
selection of clusters is made. Each individual in the cluster becomes part of the sample.
Clusters
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 36
Smaller subgroups of the population, not necessarily similar in any way besides all being together in one
place, making the individuals easier to sample together.
Stratum/Strata
The homogeneous groups in a stratified random sample. All individuals in each stratum have something in
common, and we would like to see how that affects the outcome of the sample.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 37
Multi-Stage Sampling
by Sophia
WHAT'S COVERED
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 38
You'd have to somehow account for every person in the United States, and maybe assign them a number, and
pull numbers out of a hat, or use some kind of random sampling procedure. This would be too difficult to assign
to everyone.
Strata, in this case, are still too big. You might take a few people from Maine, and a few people from Minnesota,
and a few people from North Dakota, etc., and it would still be too large. Plus, it really wouldn't be cost effective,
commuting to all these different places.
If you identified states as clusters, you would randomly select some of the clusters and then sample everyone
within that cluster. You'd be sampling entire states. For example, everyone in North Carolina would be in the
sample if you select that state as a cluster, which simply isn't feasible.
Therefore, none of those really make any sense. The way out of the box here is a multi-stage design.
2. Multi-Stage Sampling
Multi-stage sampling is a common sampling procedure utilized when the population is very, very large. With
multi-stage sampling, you continue zooming in from larger areas to smaller and smaller areas until you can find
a small enough sample of the people you need.
To perform a multi-stage sampling, first select clusters, then take a simple random sample from each cluster.
WATCH
STEP BY STEP
Step 1: States
When sampling the United States as a whole, states make the most sense as clusters because of
geographic simplicity. It’s not realistic or feasible to sample everyone within a state, so randomly select just
five states: California, Tennessee, Minnesota, Massachusetts, and Oklahoma. Pick one state and start the
process.
Step 2: Counties
It is equally unrealistic to sample everyone in Minnesota, so you can narrow your sample by randomly select
counties. Perhaps you select Carver County, Marshall County, and maybe a few other counties. If that's a
small enough basis for you to get everyone within the county, then you can stop.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 39
Step 3: Towns
If you need yet a smaller sample size, you can choose just one county, like Carver County, and sample
towns within that county. Perhaps you randomly select three of those towns: Chanhassen, Waconia, and
Chaska. If those are small enough units, then you can stop.
Step 4: Neighborhoods
However, if the sample size is still too large, you can continue to narrow it down. Within Chaska, for
example, you can sample some neighborhoods. Typically, by the time you get to neighborhoods within a
town, it's easy enough to walk around the neighborhood and get almost everybody within that
neighborhood.
Now you can move onto the next cluster, where you would repeat this process with the remaining four
states.
TERM TO KNOW
Multi-Stage Sampling
A sampling design which combines elements of cluster sampling, stratified random sampling, and
simple random sampling. It "zooms in" on smaller areas to sample so that sampling becomes more
feasible.
SUMMARY
You began this tutorial by comparing sampling methods when presented with the challenge of
sampling from the entire United States as a whole. Multi-stage sampling is used when the population is
so big and the groups, strata or clusters so large that it makes more sense to zoom in and take small
groups. You begin with certain clusters, and then you sample within those clusters instead of taking the
full cluster. Therefore, multi-stage sampling combines elements of cluster sampling, stratified designs,
and simple random designs, which were contrasted within this tutorial, though you may recall, none of
these were feasible when attempting the sample of the United States.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Multi-Stage Sampling
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 40
A sampling design which combines elements of cluster sampling, stratified random sampling, and simple
random sampling. It "zooms in" on smaller areas to sample so that sampling becomes more feasible.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 41
Observational Studies and Experiments
by Sophia
WHAT'S COVERED
This tutorial will explore observational studies and how they are conducted. We will also cover
experiments, which are a little different than observational studies, through the exploration of:
1. Observational Studies
2. Types of Observational Studies
3. Experiments
4. Experiments vs. Observational Studies
1. Observational Studies
An observational study is a type of study where the researcher can observe but does not administer any
treatment. Therefore, whatever would normally happen, the researcher has to allow it to happen.
Researchers can't change anything about the people or subjects they are studying. The researcher can record
the variables of interest, but again, can't affect the study. People have to be allowed to do whatever it is they
were going to do without interruption.
TERM TO KNOW
Observational Study
A type of study where researchers can observe the participants but not affect the behavior or outcomes
in any way.
Retrospective Study: Researchers look to the past to see what has already happened; also known as a
case-control study.
EXAMPLE Consider observing people who are sick—those are called the cases—versus people that
aren't sick, which are the controls. Then, you look back to see what similarities the cases have in common
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 42
and what similarities the controls have in common.
Prospective Study: Researchers select individuals to participate and record what happens as it happens;
also known as a longitudinal study.
EXAMPLE Individuals are engaging in activities like smoking or jogging. You record what happens as it
happens, as opposed to trying to look back and figure it out.
IN CONTEXT
The year is 1929 and a cancer doctor has a suspicion that smoking may cause cancer. His cancer
patients become his subjects, or participants, in his study. He asks his subjects, “Did you happen to
smoke before you got cancer?” What he found was an overwhelming majority of his cancer patients
did, in fact, smoke. Therefore, this doctor was the very first person to suggest a link between smoking
and cancer.
That inspired some new studies, one of which began in 1934. It dealt with several thousand doctors,
so it was a physician’s smoking study. The reason doctors were chosen is that doctors are usually very
diligent about following protocols, meaning that those who smoked would likely continue to smoke,
and those who didn't smoke would likely continue not smoking. Also, doctors typically wouldn't drop
out of a study. Notice in the image below, how some of these physicians smoked, and some of them
did not.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 43
They did the study, and some of the doctors got cancer. Now, not every doctor who smoked ended up
getting cancer, and not every person who got cancer was a smoker. However, what they found was
that the vast majority of the time, it was the doctors who smoked that got cancer.
This study was conducted over a long period of time—a 20-year study. At its conclusion, this was the
most convincing evidence that smoking had an effect on cancer. This was an example of a
prospective study because it started with the doctors and followed them through to 1954.
It is important to note, however, that neither of these types of studies, prospective or retrospective, can actually
prove a cause-and-effect relationship. The only thing that can prove a cause-and-effect relationship between
two variables is an experiment.
TERMS TO KNOW
Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand how they
became the way they are in the present.
Prospective Study
A study that begins by selecting participants, then tracks them and keeps data on the subjects as they
go into the future.
Subjects/Participants
The people or things being examined in an observational study.
3. Experiments
An experiment is a different type of study than an observational study. The differences will be covered in detail
shortly, but essentially, the researchers are allowed to impose treatments on the participants. Treatments are
administered, and response to those treatments is measured. Because the researchers are the ones
implementing the treatments and measuring the response, a cause-and-effect relationship between variables
can be determined.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 44
When discussing experiments, there is some very common terminology that you should be aware of. For
example, as mentioned in the section above, subjects and participants are used interchangeably and describe
people involved in an experiment.
If animals or things are used in an experiment, they are referred to as experimental units. While it may seem a
bit impersonal, it is universal terminology in the field of experiments.
TERMS TO KNOW
Experiment
A type of study where researchers impose treatments on the participants or experimental units.
Experimental Unit
An animal or thing involved in an experiment.
An experiment, on the other hand, is far more active on the part of the researcher. The researcher is creating
the differences between the two groups, then determining whether or not there is a cause-and-effect
relationship.
If you have a study that you'd like to do, but you can't perform it due to ethical or practical concerns, or it takes
too much time or money, you can avoid those concerns or circumvent them by doing an observational study.
THINK ABOUT IT
When trying to determine if cigarette smoking causes cancer, several observational studies have been
conducted, but never a true experiment. Why would that be?
Well, it would be unethical to break people into groups and administer cigarettes to a group of people when
trying to determine if it causes terminal illness. The same applies to alcohol consumption.
BIG IDEA
There are certain instances in which an observational study will be preferred over an experiment due to
factors like time, money, and privacy, where it is unlikely people will divulge that type of information.
SUMMARY
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 45
An observational study is a type of study where the researcher can observe but not influence the
behavior of the participants, or subjects. There are two types of observational studies: A retrospective
study involves looking back at behavior, while a prospective study involves gathering your participants
and following them along as they live their lives. An observational study, though, cannot prove a cause-
and-effect relationship.
Conversely, in an experiment, a researcher can directly influence the subjects by applying treatments.
Because the researchers are the ones implementing the treatments and measuring the response, a
cause-and-effect relationship between variables can be determined. Terminology such as subjects and
participants are important to know since they identify individuals directly involved in the experiment.
Animals may be directly involved in an experiment, but they are referred to as experimental units rather
than subjects or participants.
When comparing experiments vs. observational studies, sometimes an experiment may be unethical,
expensive, or too lengthy. In those cases, observational studies may be used, which allow a researcher
to study occurrences in a natural setting without administering treatment of any kind.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Experiment
A type of study where researchers impose treatments on the participants or experimental units.
Experimental Unit
An animal or thing involved in an experiment.
Observational Study
A type of study where researchers can observe the participants but not affect the behavior or outcomes in
any way.
Prospective Study
A study that begins by selecting participants, then tracks them and keeps data on the subjects as they go
into the future.
Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand how they
became the way they are in the present.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 46
Subjects/Participants
The people or things being examined in an observational study.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 47
Prospective and Retrospective Studies
by Sophia
WHAT'S COVERED
This tutorial will explore the two types of observational studies, a retrospective study and a prospective
study, through the exploration of:
1. Observational Studies
2. Types of Observational Studies
2a. Retrospective Study
2b. Prospective Study
1. Observational Studies
An observational study is a type of study where the researcher can observe but does not administer any
treatment. Therefore, whatever would normally happen, the researcher has to allow it to happen.
Researchers can't change anything about the people or subjects they are studying. The researcher can record
the variables of interest, but again, can't affect the study. People have to be allowed to do whatever it is they
were going to do without interruption.
TERM TO KNOW
Observational Study
A type of study where researchers can observe the participants but not affect the behavior or outcomes
in any way.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 48
It can be similar to a matched-pair design in an experiment, but in this case, the researchers are not giving a
treatment or doing anything to affect the people.
EXAMPLE In a study, suppose you take a pair of participants who are similar across most variables
except for one major difference—one participant has a disease, “the case,” and one participant does not
have a disease, “the control.” Because the participants are so similar, you are focusing on just that disease
and seeing how it affects the participants or what causes the disease.
This is considered retrospective because it looks in the past. You ask the participants to recall past events
or use information about their past to determine what risk factors there are for the disease.
TERM TO KNOW
Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand how they
became the way they are in the present.
EXAMPLE The Framingham Heart Study started in 1948 and is still going on today. Now, 5,209 healthy
adults from Framingham enrolled in this study. Researchers collected a variety of information about the
subjects, including social networks, eating habits, exercise habits, and several markers for heart health.
Over a thousand different research papers have been written using this information. Some of these papers
have proven that obesity and smoking increase the risk of heart failure. Other papers look at how the social
networks tie to obesity risks.
TERMS TO KNOW
Prospective Study
A study that begins by selecting participants, then tracks them and keeps data on the subjects as they
go into the future.
SUMMARY
An observational study is a type of study where the researcher can observe but not influence the
behavior of the participants, or subjects. There are two types of observational studies: A retrospective
study involves looking back at behavior, while a prospective study involves gathering your participants
and following them along as they live their lives. An observational study, though, cannot prove a cause-
and-effect relationship.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 49
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Observational Study
A type of study where researchers can observe the participants but not affect the behavior or outcomes in
any way.
Prospective Study
A study that begins by selecting participants, then tracks them and keeps data on the subjects as they go
into the future.
Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand how they
became the way they are in the present.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 50
Experimental Design
by Sophia
WHAT'S COVERED
In this tutorial, you're going to learn about the principles of experimental design. Specifically, this lesson
will cover:
1. Components of Experimental Design
1a. Control
1b. Randomization
1c. Replication
1. Control
2. Randomization
3. Replication
TERMS TO KNOW
Experimental Design
The way in which an experiment is carried out. A good design has key elements of randomization,
replication, and control.
Treatment
Something the researchers administer to the subjects or experimental units.
1a. Control
Control means holding everything else besides what you're trying to measure constant. The purpose is to
determine whether or not your treatment is effective. In other words, if there is an observable difference
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 51
between groups, is it due to the treatments or due to a confounding variable? It is important to control all other
variables to help limit confounding.
One common way to control an experiment is with a control group. A control group is a set of samples that do
not receive the treatment under consideration. For instance, if you were studying a new cancer treatment, a
control group might get the standard cancer treatment care, while the treatment group receives the new drug
or treatment being evaluated. In this case, the control group allows researchers to measure the effectiveness of
the treatment against a group that is otherwise similar. Generally, the participants won't know if they are in the
control or treatment group, as this knowledge can affect the results.
WATCH
IN CONTEXT
Suppose you are a farmer, and you want to try a new fertilizer in your field. One thing you could do is
choose 10 fields with similar soil nutrients, sunlight, and water—all variables that could affect the crop
growth.
You could then apply the old fertilizer to five fields and the new fertilizer to the other five. By keeping
all the other variables—soil nutrients, sunlight, water—consistent, the differences between the fields
can be isolated and attributed to the old fertilizer or the new fertilizer.
Does the new fertilizer work? Is it effective? This is the idea behind controlling for all of these other
variables.
TERMS TO KNOW
Control
The principle of experimental design that requires that other variables which may confound the
experiment be held constant between the treatment groups so that any differences in the groups can
be attributed to the different treatments.
Control Group
A group included in an experiment that does not receive the treatment under consideration and against
which other experimental results can be compared and validated.
1b. Randomization
The second big idea of experimental design is randomization. The treatments must be assigned to the subject
using a random process, otherwise known as “randomization.” The purpose of random assignment is to try and
filter out all the other sources of variation that you couldn't anticipate controlling for.
EXAMPLE Referring to the farmer example, even though you made the fields as similar as possible with
respect to water, sunlight, and soil, it's possible that there is a variable that you didn't think to control for.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 52
Perhaps some fields had moles under the ground, and that would affect how the crops grow. How would
you know to control for moles?
By randomly assigning treatments to the fields, you can hopefully get some fields with moles in the groups
of fields with both the new and old fertilizer. Randomization smooths out those effects that other variables
might bring into the equation.
HINT
Randomizing also helps avoid bias because you can’t be tempted to assign treatments to the experimental
units you think might give favorable outcomes.
Randomization in an experiment does not really achieve the same purpose as a random selection in a sample.
When you do a simple random sample, the idea is to get a sample that's representative of the population. In an
experiment, the purpose of randomly assigning individuals to groups is to filter out unknown sources of
variation. The assignment in an experiment, however, is fairly similar to the way you would randomly select in a
sample.
TERM TO KNOW
Randomization
The principle of experimental design that requires that the subjects/experimental units be assigned to
groups using some random process. This ensures that the two groups are roughly equal prior to
assigning treatments.
1c. Replication
Replication is the last key idea in experimental design, which basically states that a bigger sample is better.
Repeating the experiment on multiple subjects or experimental units is a better idea than doing a few. Why is
that?
A larger size of the experiment means it's more likely that you can find trends that perhaps you wouldn't have
found in a smaller experiment. The more you replicate, and the more experimental units you can get into your
experiment, the more likely it is that you're going to find the true trends that arise, rather than some freak
anomaly.
THINK ABOUT IT
What if the farmer could have just found two fields that were similar to each other, instead of 10 fields, and
randomly assigned one to get the new fertilizer and one to get the old. Isn't it possible in that case that
maybe the field with the old fertilizer does very well just by random chance?
This would make it seem like the new fertilizer is not effective when perhaps it is. Or the opposite could
happen, where it seems like the fertilizer is effective when it's not. It would be better to randomly assign five
plots, as opposed to just two, as it is more likely that the farmer is going to find trends among those five
plots that are more valid.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 53
TERM TO KNOW
Replication
Repeating the experiment on multiple subjects/experimental units. This principle of experimental design
states that a larger experiment with more subjects/experimental units will allow us to more clearly see
differences between the treatments.
SUMMARY
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Control
The principle of experimental design that requires that other variables which may confound the
experiment be held constant between the treatment groups, so that any differences in the groups can be
attributed to the different treatments.
Experimental Design
The way in which an experiment is carried out. A good design has key elements of randomization,
replication, and control.
Randomization
The principle of experimental design that requires that the subjects/experimental units be assigned to
groups using some random process. This ensures that the two groups are roughly equal prior to assigning
treatments.
Replication
Repeating the experiment on multiple subjects/experimental units. This principle of experimental design
states that a larger experiment with more subjects/experimental units will allow us to more clearly see
differences between the treatments.
Treatment
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 54
Something the researchers administer to the subjects or experimental units.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 55
Randomized Block Design
by Sophia
WHAT'S COVERED
This tutorial is going to teach you about a randomized block design, which is a little bit different than
other types of designs that we've studied so far. Specifically, this lesson will cover:
1. Randomized Block Design
2. Block Design vs. Randomized Design
Once participants are in their similar group, they are randomly assigned to treatment or control within that
group.
An advantage is that it controls for variables that would otherwise be confounding. If we think that a subject's
job has an effect, we can make sure that a proportional number of people who have the same job are assigned
to a treatment and control group.
IN CONTEXT
Suppose you are a researcher, and you want to identify whether a new acid reflux drug is more
effective than the one that's currently available. You gather 500 volunteers with acid reflux, put the
number 1 on 250 cards, and the number 2 on another 250, and place all the cards in a hat. You mix
them up and have people pull out numbers.
People who received a “1” receive a new drug, and those who selected “2” receive the old drug. The
image below would be your original plan, starting with all these volunteers, men and women, and then
you randomly assign them to groups.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 56
The problem is, what if men and women respond differently to the drug?
The better design is using a randomized block design, so you try something different. First, take your
large group and break it into smaller subgroups of just men and just women.
The image above has nine men and 14 women; you had a lot more in the old design, but now you’re
going to run the experiments essentially in parallel: one experiment for men and one experiment for
women. Now you’re going to take the men and randomly assign half of them to the treatment and half
to the control. Next, you’re going to take the women and assign half of them to the treatment and half
to the control, which looks like this:
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 57
Men and women receiving the treatment are in purple, and the men and women receiving the control
are in green. You might notice there are five men receiving treatment and only four receiving control.
It’s not necessary to have exactly equally sized groups.
TERM TO KNOW
EXAMPLE Suppose the drug was more effective for women than for men. You would see that in this
experiment here. You would see that the drug was effective for women. You would also see that it wasn't
effective for men.
One minor disadvantage to running a block design is that you do lose some of the replication that you would
have if you had run it in a large group. Sometimes you need to make your sample size a little bit bigger to
overcome that. It might be a little bit harder to draw legitimate conclusions with small groups.
SUMMARY
In a randomized design, you saw how an experiment might miss an extra level of depth, such as men
and women reacting differently to a drug. The subjects or experimental units are grouped by some
similar characteristic that you think might affect the outcome. In this example, we used gender. When
evaluating block design vs. randomized design, you saw that with a randomized block design,
experiments run in parallel, resulting in two or more separate experiments. Then, you can compare the
treatments within each of those groups.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 58
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 59
Completely Randomized Design
by Sophia
WHAT'S COVERED
This tutorial will discuss a completely randomized design of an experiment, through an exploration of:
1. Completely Randomized Design
An advantage of this design is that it is very quick and easy to implement. You could take your group of
experimental units, assign them a number, and have the odds in the treatment group and the evens in the
control group. Alternatively, you could roll a die for each subject, putting ones and twos in the control group,
threes and fours in the first treatment group, and fives and sixes in the second treatment group.
However, a disadvantage of this design is that treatment and control groups could have disproportionate
representations of the population.
IN CONTEXT
Let’s say you developed a new drug to combat the symptoms of acid reflux. You want to see if it’s
more effective than what is currently available. You get 500 volunteers and write “1” on 250 slips of
papers and “2” on the other 250 slips of paper. You put all 500 sheets of papers into a hat, mix them
up, and the volunteers retrieve one slip of paper each.
Those who selected “1” will receive the new drug, and those who selected “2” receive the drug that's
currently available. This is the simplest way to assign subjects to treatments. However, it's not
necessarily ideal for every scenario.
Let’s say that the acid reflux drug is more effective for men than it is for women. It’s not really a
problem if you divide the treatment control groups like this:
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 60
In this particular case, you can see there is roughly the same number of females and males in the
treatment group and the control group. Since there is a relatively equal assignment on each side, it will
be easy to see if the new drug is more effective for males than for females. Problems occur when the
random assignment doesn't match the proportions of the population equally.
Both groups are roughly the same size. Will you be able to determine if the treatment is more effective
for men? Why not?
If the drug were more effective for men than women, you actually wouldn't notice because there aren't
that many men in the treatment group. The proportions are way out of whack. This sometimes
happens with random assignment.
You can see that in a completely randomized design, subjects are assigned using random processes such as
numbers in a random number generator, random number table, numbers in a hat, or names in a hat. The
problem is that it's not always the best way to assign treatments.
TRY IT
A tire company wants to launch a new type of rubber for its bicycle tires. It has 300 bikes to use for the
study, and a completely randomized design is desired. What would be the first step to achieving a
completely randomized design?
They could place numbers 1–300 in a hat and have each rider pull out one number. Numbers 1–150 receive
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 61
the old rubber tires, and 151–300 receive the new rubber tires. The cyclists won’t know which type of tire
they are receiving.
There is an issue with this design. Can you think what this might be?
What if bike commuters are all in the same group? They might wear their tires out faster regardless of the
new or old tires. Can you think of other aspects that may impact this experiment?
BIG IDEA
While there are better ways to gather information for an experiment, a completely randomized design is the
easiest.
TERM TO KNOW
SUMMARY
In a completely randomized design, which is the simplest way of assigning individuals, the subjects are
assigned using a random process like numbers in a random number generator, random number table,
numbers in a hat, or names in a hat. The problem is that it's not always the best way to assign
treatments.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 62
Matched-Pair Design
by Sophia
WHAT'S COVERED
This tutorial will explain matched-pair design experiments by examining the characteristics and
examples of:
1. Matched-Pair Design
1a. With Subjects in Pairs
1b. With Subjects as Individuals
1. Matched-Pair Design
In a matched-pair design experiment, you form experimental units by pairing subjects that are as similar as
possible. One subject goes to the treatment group, and the other subject goes to the control group. Having
very similar pairs helps control for the other variables we haven't considered.
EXAMPLE Choosing a pair of women who are the same age, have the same exercise habits, and live in
the same area allows us to look at only the variable we are studying, while avoiding the effects of age,
exercise, and location on the outcomes of the experiment.
In matched-pair design, subjects can be assigned to the treatment and control groups in two different ways:
Subjects who are similar with respect to variables that could affect the outcome of the experiment are
paired together, and then one of them is assigned to the treatment group and one is assigned to the
control group.
Each subject is assigned to both groups, where each subject acts as their own matched pair.
HINT
This type of design is also similar to a case-control study, but here, researchers are giving a treatment
instead of just observing the participants.
TERM TO KNOW
Matched-Pair Design
An experimental design where two subjects who are similar with respect to variables that could affect
the outcome of the experiment are paired together, then one of them is assigned to one treatment and
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 63
one is assigned to the control. This can also be done by assigning each subject to both groups, where
each subject acts as their own matched pair.
WATCH
IN CONTEXT
There are 20 participants for an experiment for a flu vaccine. Gender and age may play a role in how
well this treatment works. Groups of two are created; each group is as similar as possible with respect
to any variable that may affect the outcome.
Participant 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Gender M F M M F F F M F M M M F F F M F M F M
Age 24 21 42 39 35 37 22 25 31 32 51 31 61 26 38 55 26 56 52 48
There are 10 men and 10 women of all different ages. Participants will be listed by gender. So,
participants 1, 3, 4, 8, 10, 11, 12, 16, 18, and 20 are the males. The rest are females.
Participant 1 3 4 8 10 11 12 16 18 20
Males
Age 24 42 39 25 32 51 31 55 56 48
Participant 2 5 6 7 9 13 14 15 17 19
Females
Age 21 35 37 22 31 61 26 38 26 52
Age is suspected to also play a role in effectiveness, so within the male category, two ages that that
are closest together—24 and 25—are chosen. Therefore, participants 1 and 8 will form a matched pair.
Participants 10 & 12, 4 & 3, 20 & 11, and 16 & 18 are also matched pairs due to being similarly aged
males. The same criterion is applied for similarly aged females.
Participant 1 8 12 10 4 3 20 11 16 18
Males
Age 24 25 31 32 39 42 48 51 55 56
Participant 2 7 14 17 9 5 6 15 19 13
Females
Age 21 22 26 26 31 35 37 38 52 61
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 64
Now, to continue the experiment, one of the two in the pair is randomly assigned to receive the flu
vaccine, and the other one will be assigned to the control group.
IN CONTEXT
Suppose that you have a tire company that's considering rolling out a new type of rubber for its
bicycle tires. There are 300 bicycles available. In a completely randomized design, you would place
the numbers 1–300 in a hat. Bikers that pull numbers 1–150 would receive old rubber tires, and the
numbers 151–300 would receive the new rubber tires. They won’t necessarily know who's getting
which tires.
But what if the 300 riders don't all ride the same way or equally as often? What do you do then? How
do you create two groups that are roughly the same, with the exception of the bicycle tires?
One way to do it is with a matched-pair design. You could still put the numbers 1–300 in a hat. The
only difference is that the people who pull out 1–150 would get both the old and the new. They would
put the old tire in the front and the new rubber tire in the back.
Then, the people who pulled out 151–300 would get the new rubber tire in the front and the old one in
the back.
So, there's still some randomization going on. The only difference is that every biker will get one old
tire and one new tire. This will allow you to compare the tread wear for each bike because the front
and rear tire get worn somewhat equally. It won't matter how much the biker rides or where.
SUMMARY
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 65
In a matched-pair design, two subjects whose characteristics are very similar are paired, then each one
is sent to a different group. Matched-pair design can be implemented by matching subjects into pairs
that are as similar as possible with respect to any variable that may affect the outcome. It can also be
implemented with subjects as individuals, where each subject can be assigned to both groups instead
of one, as was the case with the bicycle tires situation. Each participant then counts as his or her own
matched pair; this design essentially compares someone to themselves.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Matched-Pair Design
An experimental design where two subjects who are similar with respect to variables that could affect the
outcome of the experiment are paired together, then one of them is assigned to one treatment and one is
assigned to the control. This can also be done by assigning each subject to both treatments, where each
subject acts as their own matched pair.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 66
Surveys
by Sophia
WHAT'S COVERED
This tutorial will briefly introduce you to surveys, demonstrating the following concepts:
1. Introduction to Surveys
2. Survey Design
1. Introduction to Surveys
A survey, or sample survey, is a data gathering technique. It's an information collection tool, and a lot of
organizations use these. Surveys allow organizations a way to gather data so that they can target the specific
information that they want.
A store might use a survey to figure out something about its customers.
Politicians might use a survey to gather information about their constituents.
Someone hiring for a position in a company might use a survey to learn more about their labor market, who
they can hire, and who is not available in that area, etc.
In all of these examples, the survey is a tool being used to increase the amount of specific information someone
has. For each survey, the researcher has selected the variables of interest, or the variables that he or she is
interested in gathering data on.
TERMS TO KNOW
Survey/Sample Survey
A data collection tool that individuals in a study can fill out and return to the researcher.
Variables of Interest
The variables the survey wishes to measure about those taking the survey.
2. Survey Design
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 67
A survey must be carefully designed to elicit the intended information. The survey design is an important
element of surveys. If you are designing a survey, you want to get a representative sample of your population.
As with every sampling technique, designing a survey is all about the process and being able to get accurate
data from a representative sample.
BIG IDEA
Just like with any sample, it's important to define what you're interested in before you begin surveying.
BRAINSTORM
You might ask yourself: What are the variables that you want to measure? What information do you want
people to provide in your survey? Answering these questions is going to be important because those
answers will help you understand the purpose of the information you generate with your survey.
So, for example, if it's a survey about employment, you're going to want to ask about employment, former
employment, current employment, and things like that.
IN CONTEXT
Suppose a teacher uses the following survey at the end of the year for her students:
Course Survey
Strongly Strongly
Agree Neutral Disagree
Agree Agree
This teacher wants to know whether or not she did a good job outlining course objectives. This survey
asks about evaluating student work and academic challenge. You'll notice that she's provided answer
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 68
choices from strongly agree to strongly disagree.
The teacher thought about all of the different things she wanted to learn from her students, including
her teaching, and listed them all in her survey. The information she gathers from this survey will help
her answer the question of how clearly she outlined her course objectives for her students, as well as
informing her about other important aspects of her teaching.
TERM TO KNOW
Survey Design
The way the survey is set up. This deals with the wording of questions and answer choices.
SUMMARY
To recap this introduction to surveys, surveys—also called sample surveys—are used to obtain data or
information from the population. It's important that you determine what you want to understand, and
why and for whom the information is being collected, which may impact survey design. We also talked
about variables of interest, which are the things that you want to measure because you're interested in
knowing them.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Survey Design
The way the survey is set up. This deals with the wording of questions and answer choices.
Survey/Sample Survey
A data collection tool that individuals in a study can fill out and return to the researcher.
Variables of Interest
The variables the survey wishes to measure about those taking the survey.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 69
Blinding
by Sophia
WHAT'S COVERED
This tutorial is going to teach you about a principle of experimental design called blinding. Specifically,
this lesson will cover:
1. Blinding
2. Double-Blind and Single-Blind Experiments
1. Blinding
Blinding is one of those principles of experimental design whereby the subjects don't know what treatments
they're going to receive.
When you randomize an experiment, it is done to reduce bias. However, it's possible to give subtle clues
regarding what treatment they're receiving; it’s important that the people don’t know what they're receiving.
Why is this? Because it might be an incentive for them to either stay on the treatment if it's a drug or go off the
treatment if they think they're not getting the real drug.
Also, it may be true that people with an agenda might want to bend the results in their favor. They might want to
make the results of an experiment seem more positive than they really are. This idea of the experimenter
wanting to bend the results in their favor is called the “experimenter effect.”
To counteract both of those ideas, we implement a strategy called blinding. Only people who are behind the
scenes will know who is getting what. No one, either directly involved in the experiment or taking any of the
treatments, knows what treatments the subjects are receiving.
IN CONTEXT
If subjects know which treatment group they are assigned to, it may influence behavior. So, the
treatment group will receive a pill, and the control group will receive a pill. The only difference is that
one pill has the active treatment in it and will be only given to those in the treatment group.
Ideally, when you open the pills up, they would look the same on the inside, too. The idea is that no
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 70
one knows which pill is fake and which one has the tested drug.
The fake drug is usually some kind of a sugar or something that makes the person in the control group
feel like they're actually taking something when they’re really not.
TERM TO KNOW
Blinding
The practice of making sure that certain individuals do not know which subjects are receiving which
treatment.
In single-blind experiments, on the other hand, the subjects are blinded, but the researchers are not.
IN CONTEXT
A double-blind study is ideal, but sometimes it is just not feasible. Suppose there is an exercise study
—studying whether or not exercise is effective for weight loss. People are going to know if they're
exercising or not. It's impossible to assign people to exercise—the treatment, in this case—and have
them not know they're receiving the treatment.
However, the experimenters don't need to know who was assigned not to exercise. This is single-blind
because the experimenters don't know. The experimenters were blinded, but the subjects were not.
BRAINSTORM
Can you think of a single-blind experiment that would be set up to have the researchers know group
assignments, but the participants do not?
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 71
TERMS TO KNOW
Double-Blind Experiment
An experiment where neither the subjects nor anyone in contact with them has any knowledge of which
subjects are receiving which treatment.
Single-Blind Experiment
An experiment where either the subjects have no knowledge of which subjects are receiving which
treatment or people in contact with the subjects have no knowledge of which subjects are receiving
which treatment, but not both.
SUMMARY
Blinding is a powerful tool for preventing different types of biases, such as the experimenter effect.
Different studies allow for different levels of blinding. Ideally, double-blind experiments are best since
both participants and the people with direct contact with the participants are not aware of group
assignment. As you saw in the exercise example, sometimes double-blind just is not realistic.
Participants will know if they are exercising or not. In that case, single-blind experiments are the next
best thing, which means that either the subjects or the researchers are aware of group assignments—
but not both.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Blinding
The practice of making sure that certain individuals do not know which subjects are receiving which
treatment.
Double-Blind Experiment
An experiment where neither the subjects, nor anyone in contact with them, has any knowledge of which
subjects are receiving which treatment.
Single-Blind Experiment
An experiment where either the subjects have no knowledge of which subjects are receiving which
treatment, or people in contact with the subjects have no knowledge of which subjects are receiving
which treatment, but not both.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 72
Placebo
by Sophia
WHAT'S COVERED
1. Placebo
In basic terms, a placebo is a fake treatment. That doesn’t mean that people don’t respond to it; instead, they
think or expect that the treatment will result in a change. A placebo doesn't do anything. It has no active
treatment, yet people feel better anyway, as if they have willed themselves to feel better. This is called the
placebo effect.
While the treatment group gets the actual drug, the control group receives a placebo as their treatment. They
get the fake drug with no active ingredient in it—usually some kind of a sugar or something. It doesn't do
anything and has no active ingredient.
Sometimes, the treatment containing the actual drug doesn't work any better than the placebo. This can
happen. It’s evidence against the treatment working.
IN CONTEXT
Suppose that you developed a treatment that relieved pain, and you conducted a study on pain. You
had a control group receiving a sugar pill and a treatment group receiving the actual drug that you
created. Here are your results.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 73
Would you say that your treatment is effective? Why or why not?
The answer here is that your treatment is not very effective. The numbers, 42 and 36, are not far apart.
These results would be weak evidence for the effectiveness of the drug.
Notice that you still have 36% of patients in the placebo group reporting relief of pain. However, the
difference between 36% and 82% is significant. This would be considered evidence for the
effectiveness of the drug.
TERMS TO KNOW
Placebo
An inert drug or treatment given to the control group. It has no active ingredient in it.
Placebo Effect
The observed phenomenon whereby certain individuals will exhibit a desired response even when
taking a placebo which contains no active ingredient.
SUMMARY
Placebos are a form of control. They're a fake drug. People can respond to the fake drug, thinking they
are receiving treatment, which is called the placebo effect. Experimenters will assess the effectiveness
of the treatment against the effectiveness of the placebo. If the gap between the two is significant, it is
considered evidence that treatment has a considerable effect.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 74
Placebo
An inert drug or treatment given to the control group. It has no active ingredient in it.
Placebo Effect
The observed phenomenon whereby certain individuals will exhibit a desired response even when taking
a placebo which contains no active ingredient.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 75
Variables
by Sophia
WHAT'S COVERED
This tutorial will discuss variables within the field of statistics and introduce the concept of confounding
variables. The following elements will be the main focus of this tutorial:
1. Variables
1a. Variables of Interest
1b. Explanatory and Response Variables
2. Confounding Variables
1. Variables
In statistics, a variable is any attribute that we can measure about a population, used in a study. It is very
important to carefully define the variables to be measured when creating a study.
Age
Weight
Gender
Ethnicity
Favorite food
Number of pets
Smoker or nonsmoker
ZIP code
Number of siblings
Political affiliation
Favorite sport
All sorts of these things are variables. You might want to know only one of these things or some of these things.
TERM TO KNOW
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 76
Variable
Any attribute or number that can be measured about individuals in a study.
However, if you were conducting a weight loss study, the political affiliation will likely not be a variable to
measure, but favorite food might seem important.
TERM TO KNOW
Variable of Interest
Any variable which we need to know about in the context of a study.
In those cases, we define the one that causes the other as the explanatory variable. In a study, you can have
more than one explanatory variable.
Then, variables that are the result are called response variables.
Explanatory: Average monthly temperature You might assume that as the temperatures get warmer, ice
Response: Ice cream sales cream sales would go up in kind.
Something that's a little bit less obvious is whether or not gender, which is a categorical variable, plays a role in
which political party people will choose. Are males more likely to be Republican? Or are women more likely to
be independent voters? We don't know. But that would be an interesting question to investigate.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 77
TERMS TO KNOW
Explanatory Variable
A variable that we believe is predictive of something else. An increase in this variable will correspond to
an increase or decrease in some other variable.
Response Variable
A variable that is affected by the explanatory variable.
2. Confounding Variables
The word confounding refers to when two variables get mixed up with one another and you can't tell the effect
of one variable from the effect of the other variable. The confounding variable is the one not accounted for in a
study. It is an unseen variable that has a significant effect on the response variable and is also related to the
explanatory variable.
IN CONTEXT
Suppose that a researcher wants to know whether a high protein diet will help lab rats gain more
weight than a low protein diet. The researcher has 26 lab rats, and she selects 13 of the smallest rats
to receive the low protein diet and 13 of the largest to receive the high protein diet. At the end of the
study, she weighs the rats to determine their weight gain and finds that the rats on the high protein
diet gained more weight.
Can you think of anything that she did wrong in this study?
The answer involves the occurrence of confounding. Remember, confounding is when two variables
get mixed up and you can't tell the effect of one variable from the effect of the other variable.
In this case, the effect of the diets—whether or not the high protein diet caused the rats to gain more
weight—was confounded by the fact that the heaviest rats were put on the high protein diet. It’s not
clear if the high protein diets were effective at weight gain. Something else may have caused the
weight gain since they were heavy already.
Therefore, these are the two variables of interest in the study. The high protein diet was supposed to
be the explanatory variable. The weight gain was supposed to be the response variable. The
researcher was going to try to figure out a link between the two.
However, because of the way she assigned the rats, only a limited conclusion could be drawn. She
wasn't able to draw the direct conclusion that she was hoping for—and that is confounding.
Confounding should be limited in experiments when possible.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 78
TRY IT
A high school math teacher, hoping to have his students do well on the final, offers an optional review
session. He states, “No one who's ever attended the review session has ever scored less than a B.”
What is the teacher trying to imply? Why isn’t his implication correct?
You may have come up with the notion that he's trying to imply that the review sessions will cause the
students to do better. That may be true; however, there may be a few confounding variables. Maybe only
his best and brightest students attend the optional review, and these are students that may have done well
on the final exam anyway. The effects, if any, are confounded by the intrinsic motivation of students to show
up to the session.
TERMS TO KNOW
Confounding
Occurs when the effects of the treatments, if any, are indistinguishable from the potential effects of
some other variable which was unaccounted for.
Confounding Variable
A variable which was not accounted for in a study, which limits the conclusions that the study can draw.
SUMMARY
Variables are what we choose to measure in a study. The variables of interest will depend on the
questions that you're trying to answer. Not every variable must be measured—just the ones that are of
interest. By looking at variables in context, you learned that if a cause-and-effect relationship is thought
to exist, you can break the variables down even further into explanatory and response variables.
Confounding occurs when there is a variable that is chosen as an explanatory variable in an
experiment, but because another variable got in the way, it cannot be determined to explain a cause.
You explored confounding variables in action to demonstrate how they can limit the conclusions that
can be drawn from the supposed explanatory variable. In effect, the confounding variable inhibits a
cause-and-effect conclusion. Often, it's one that you didn't think to measure, which is problematic.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 79
Confounding
Occurs when the effects of the treatments, if any, are indistinguishable from the potential effects of some
other variable which was unaccounted for.
Confounding Variable
A variable which was not accounted for in a study, which limits the conclusions that the study can draw.
Explanatory Variable
A variable that we believe is predictive of something else. An increase in this variable will correspond to
an increase or decrease in some other variable.
Response Variable
A variable that is affected by the explanatory variable.
Variable
Any attribute or number that can be measured about individuals in a study.
Variable of Interest
Any variable which we need to know about in the context of a study.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 80
Question Types
by Sophia
WHAT'S COVERED
This tutorial will cover the topic of question types. We will cover binomial questions as well as discuss
the difference between open-ended and closed questions, through the exploration of:
1. Binomial Questions
2. Closed Questions
3. Open-Ended Questions
1. Binomial Questions
Recall that there are two types of data:
Do you think that this is a qualitative type of question or a quantitative type of question?
A binomial question collects qualitative data because there are two possible responses. It's a question with two
categories.
EXAMPLE The simplest version of a binomial question is one with the answer choices of yes or no. You
might remember this type of question from elementary or middle school:
Yes No
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 81
Other examples of binomial questions include:
In that last question, some people feel like they fall somewhere in between the two options. They may currently
be a smoker, but they are trying to quit. Sometimes questions have some shades of gray.
What about this one?
This is a binomial question that would address people who don't currently smoke but used to.
Sometimes things don't neatly fit into two boxes. Nor do they work when the questions have more than two
answers or are open-ended questions such as, “How do you feel about the construction of the new baseball
diamond located on the north end of town?” It doesn't really work to place something like that into two
categories.
TERM TO KNOW
Binomial Question
A question with only two answer choices.
2. Closed Questions
Many surveys have a combination of open and closed questions. Closed questions have short, definite, usually
multiple-choice type answers.
The Teacher ❍ ❍ ❍ ❍ ❍
Class Content ❍ ❍ ❍ ❍ ❍
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 82
In the above example, you'll notice that the highlighted pink area shows multiple choices—poor, fair, satisfactory,
good, and excellent—and those are your only choices.
HINT
When there are only certain answers to select, such as yes/no or multiple choice, that is the signal that you
are dealing with a closed question. Therefore, a binomial question is also an example of a closed question.
TERM TO KNOW
Closed Question
A question type with only so many different answer choices.
3. Open-Ended Questions
Open questions, also called open-ended questions, are subjective. These are areas where someone can click
into the field and start to type their comments and/or opinions. These comments are open to the interpretation
of the person being surveyed.
The comments are also open to the interpretation of the person conducting the survey when they do the
analysis. Usually, they need to be analyzed by a person in order to really get the full effect from it. Oftentimes, in
the desire for simplicity, someone will give a question in closed form that really should be an open-ended
question.
The Teacher ❍ ❍ ❍ ❍ ❍
Class Content ❍ ❍ ❍ ❍ ❍
THINK ABOUT IT
Suppose you are in a court of law and the lawyer asks, “Were you at the crime scene?”
“Yes, but I didn’t see anything other than people running and police arriving. It was chaos.”
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 83
“Just yes or no, please.”
The lawyer asked a closed question and wants only a yes/no answer. By attempting to explain your
circumstance, you were trying to answer it in an open-ended question type. The lawyer reverts back to the
closed question again by asking you to select either “yes” or “no.”
TERM TO KNOW
Open Question
A question type with no answer choices; the respondent can choose what he or she wants to say to
answer the question.
SUMMARY
Binomial questions produce categorical data. These are questions with two possible responses, or two
categories. It's important to consider whether or not there really are just two categories before you ask
something as a binomial question. Open questions, also referred to as open-ended questions, allow for
more explanation, and they're sometimes difficult to interpret because they're not very cut-and-dried
like closed questions. Sometimes, open-ended questions are called “essay” questions. Closed
questions are easier to interpret, but they're not always appropriate for the situation. Closed questions
are sometimes called multiple-choice type questions.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Closed Question
A question type with only so many different answer choices.
Open Question
A question type with no answer choices; the respondent can choose what he or she wants to say to
answer the question.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 84
Accuracy and Precision in Measurements
by Sophia
WHAT'S COVERED
This tutorial will discuss accuracy in measurement versus precision through the following exploration:
1. Contrasting Accuracy and Precision
1a. Scale Example
1b. Dartboard Example
Precision, on the other hand, is concerned with how consistent the measurements are to each other. In other
words, how close are the measurements to a single value, regardless of whether or not that single value is the
right answer.
TERMS TO KNOW
Accuracy
The extent to which the values, when considered all together, center around the correct value for a
variable.
Precision
The extent to which the values are very close to each other, even if they are not near the correct value.
You take someone who weighs 161.8 pounds and place them on the four different scales, five times each.
Take a look at Scale #1 and determine if this scale is accurate, precise, both, or neither.
Scale 1
Accuracy ✔ Precision ✘
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 85
160.4 158.8 161.4 164.2 162.0
Scale #1 is accurate because the numbers average out to the right answer of 161.8. Although it reported a fairly
low number such as 158.8 and a high number of 164.2, by and large, the numbers average out to what's pretty
close to the right answer.
However, Scale #1 is not precise because the numbers are not close to a single value every time.
Take a look at Scale #2 and determine if this scale is accurate, precise, both, or neither.
Scale 2
Accuracy ✘ Precision ✔
But take a look at the average. The average of Scale #2 is about 168, which is overestimating by at least 7
pounds, so this scale is not accurate.
Take a look at Scale #3 and determine if this scale is accurate, precise, both, or neither.
Scale 3
Accuracy ✔ Precision ✔
Take a look at Scale #4 and determine if this scale is accurate, precise, both, or neither.
Scale 4
Accuracy ✘ Precision ✘
BRAINSTORM
If you worked for a consumer report company and you were evaluating the above scales, which scale
would you choose and why?
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 86
A dartboard is a very popular example of precision and accuracy, assuming the bullseye is the desired
outcome, or “value.”
Accurate
Not Accurate
Precise and Accurate: In the top left corner, the darts are clumped together AND around the bullseye.
Not Precise, but Accurate: In the top right corner, the darts are not clumped together, but they loosely
surround the bullseye.
Precise, but Not Accurate: In the bottom left corner, the darts are clumped together, but not around the
correct “value,” or in this case, the bullseye.
Not Precise nor Accurate: In the bottom right corner, the darts are spread out and are not surrounding the
bullseye.
SUMMARY
By contrasting accuracy and precision, you now know that accuracy is how close the measurements
are to the right answer, though they may not necessarily land exactly on the correct answer. Precision is
how consistent measurements are with each other, even if they are not near the correct value.
Generally, you will see them clumped together. In a given measurement scenario, as you saw in the
scale example and dartboard example, high accuracy and high precision is ideal.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 87
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Accuracy
The extent to which the values, when considered all together, center around the correct value for a
variable.
Precision
The extent to which the values are very close to each other, even if they are not near the correct value.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 88
Absolute Change and Relative Change
by Sophia
WHAT'S COVERED
In this tutorial, you're going to learn about the difference between absolute change, which is an
increase or decrease represented as a raw number, and relative change, which relates that change
differential back to the original value. Specifically, this lesson will cover:
1. Absolute Change and Relative Change
2. Calculating Absolute Change
3. Calculating Relative Change
4. Examples of Absolute Change and Relative Change
EXAMPLE Suppose a political candidate's approval rating went up from 44% to 48%. That absolute
change is four percentage points.
Relative change is the percent difference from the previous value, and it's always expressed as a percent.
HINT
IN CONTEXT
An infant weighed 6.5 pounds at birth, and 1 year later, weighed 14.5 pounds. Decide if each of the
following statements is true.
Well, that's a true statement; 14.5 minus 6.5 is 8 pounds. It increased by 8 pounds.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 89
This one's a little bit less obvious, but it's also true. The 8-pound increase was more than double what
the birth weight was. It was an increase of over 100%. In fact, when you do the calculation, 8 divided
by 6.5 is 123%.
TERMS TO KNOW
Absolute Change
The raw increase or decrease in the value of a variable
Relative Change
The percent increase or decrease in the value of a variable.
FORMULA TO KNOW
Absolute Change
FORMULA TO KNOW
Relative Change
In the example above, the absolute difference was 8 pounds, and the original value was 6.5. When you put this
into a calculator, you get 1.23.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 90
When expressed as a percent, 1.23 is 123%. That means that there was a 123% increase over the birth weight.
That was the relative change.
WATCH
IN CONTEXT
Let's look at another example. The following table shows the results of the 1990 census and the 2000
census, along with the absolute change and relative change.
Absolute Change: To calculate the absolute value, simply subtract the 1990 value from the 2000
value. For example, Florida's absolute value can be found by subtracting 12,937,926 from 15,982,378
to get an absolute change of 3,044,452.
All of the states in the list had increases in the population. Some were not very much, like Hawaii,
which only had about a 100,000-person increase. Some were a lot, like Georgia and Florida, whose
population increased by over a million people. The highest absolute change was 3,044,452 people, in
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 91
Florida.
Relative Change: The question of which state had the largest relative change between that time is a
little bit different. Looking at Florida again, you need to figure out if the absolute change of around 3
million was a large change percentagewise from the old population of about 13 million. It was a large
increase, but was it the largest percent increase in the list?
To find the relative change, take each absolute change and divide by the old population from 1990.
Florida's relative change was positive 24%—approximately 3 million divided by 13 million gives you
about 24%. Georgia's increase was about 26%, a little bit larger of a percent increase than Florida. The
highest of the list was a 29% increase in the state of Idaho. Notice it didn't have a very large absolute
change. But its population wasn't very big to begin with, so even a small absolute change can be a
large relative change.
SUMMARY
Absolute change is the absolute difference in raw numbers. It's the change in units. Relative change
examines how the new number compares to the previous number in terms of a percent. Did it go up by
10%? Did it go down by 7%? What happened percentagewise from then to now? You learned how to
calculate absolute change by simply calculating the difference between the new and the old numbers.
You also learned how to calculate relative change by taking the absolute difference and dividing it by
its originating value. Lastly, you explored both concepts in practice by exploring examples of absolute
change and relative change.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Absolute Change
The raw increase or decrease in the value of a variable
Relative Change
The percent increase or decrease in the value of a variable.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 92
FORMULAS TO KNOW
Absolute Change
Relative Change
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 93
Using Percentages in Statistics
by Sophia
WHAT'S COVERED
This tutorial will discuss how to use percentages wisely in statistics by focusing on:
1. Percentage Point vs. Percent
2. Examples
2a. Retaking a Test
2b. A Politician's Approval Rating
Percent change describes the relative change (increase or decrease) in a percent value. Percentage points are
used to measure absolute change.
TERMS TO KNOW
Percentage Points
An absolute increase or decrease in a percent value.
Percent Change
A relative increase or decrease in a percent value.
2. Examples
2a. Retaking a Test
Suppose a teacher gives a particularly difficult exam, and these six students all failed it. The teacher graciously
offered a retake to the students, and they all passed.
The table below shows their original score and their retake score. On the retake, Jonathan scored an 88, Ryan
scored a 78, Katherine scored an 84, etc.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 94
Original
Student Retake Score
Score
Change in
Original
Student Retake Score Percentage
Score
Points
Now, who had the highest percent increase? Now you need to look at the raw increased numbers and
determine who had the highest percent increase over their old score.
Begin with Jonathan's scores. We need to determine how much of an increase 36 percentage points is over
that original score of 52.
Change in
Original Percent
Student Retake Score Percentage
Score Increase
Points
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 95
Isaiah 44% 89% 45% 102%
But it was Ryan who had the highest percent increase. He started with a 38 and finished with a 78, a 40-
percentage-point increase. A 40-percentage-point increase over a score of 38 is over 100%, meaning he more
than doubled his old score.
WATCH
Suppose Patrick has found his way to class president at Memorial High School. But his approval rating has just
hit the skids, dropping from 56% to 42%.
First, let’s determine the absolute change in his approval rating. Take 42 and subtract 56 from it.
This gives you negative 14. So, Patrick's approval rating dropped 14 percentage points. It’s a drop, but looking at
it that way, Patrick isn’t too concerned.
However, how does that drop look when you calculate it in terms of relative change? Take the 14-percentage-
point drop and divide it by the original approval rating, 56.
That will give you -0.25, or a 25% drop. Viewed in this context, Patrick sees the drop is a significant one, which
he might not have expected.
SUMMARY
When percentages are used in statistics, it's important to know whether the focus is absolute change or
relative change. Absolute change is the difference in percentage points and relative change is a
percent increase or percent decrease. You explored several examples illustrating the different context
provided by calculating absolute change vs. relative change, involving retaking a test and a politician's
approval rating.
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 96
TERMS TO KNOW
Percent Change
A relative increase or decrease in a percent value
Percentage Points
An absolute increase or decrease in a percent value.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 97
Index Number and Reference Value
by Sophia
WHAT'S COVERED
This tutorial is going to teach you about index numbers and reference values, through the definition
and discussion of:
1. Index Numbers and Reference Values
2. Consumer Price Index and Inflation
To calculate the index value for other points in time, you would take the current price, divide by the reference
value, and then convert that value to a percent.
FORMULA TO KNOW
Index Number
How do we work with index numbers and reference values most of the time? Consider the following example:
In 1983, a gallon of milk cost $2.24, so you assign this reference value of $2.24 an index value of 100.
Essentially, this means that it cost 100% of what it cost in 1983—a fairly obvious statement.
To calculate the index value for other points in time, like in 1988, when a gallon of milk cost $2.30, or 1993,
when it cost $2.86, you would take the current price, divide by the reference value of $2.24, and then convert
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 98
that value to a percent.
The index value in 1988, then, is $2.30 divided by the reference value of $2.24. That gives you 1.027, which as a
percent is 102.7%. Note that index values are expressed without the percent symbol, so the index value in 1988
was 102.7. You can complete the table with the remaining values.
What this indicates is that by the time you get to 2003, a gallon of milk cost 142.4% as much as it did in 1983, or
a 42% increase over 1983.
TERMS TO KNOW
Index Number
A way to measure the relative change in a value, usually the price of a good or service, over time. If the
index number is over 100, that means the price has increased. If the price has decreased, then the
index number will be less than 100.
Reference Value
An arbitrarily chosen starting value for an index. It is assigned an index number of 100.
The CPI is a general measure of inflation. Inflation means that the index is going up. It's a decline in purchasing
power, which means that it costs more now to buy these goods and services than it did in 1983. That means that
the dollar is inflated. Put another way, inflation means that with the same amount of money coming in and with
the same income, you have less purchasing power. It may cost you much more now to do what it cost $100 to
do in 1983.
Here's a graph of the CPI over time. Notice the index value is 100 in 1983, between 1980 and 1990. Goods and
services costing $100 in 1983 will cost you around $200 if you look at around 2007. Therefore, the index value
was 200 in 2007.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 99
TERMS TO KNOW
Inflation
A relative increase in the price of a good or service over time. A person will need to pay more to
receive the same good or service than they did at a previous point in time.
SUMMARY
Index numbers allow us to check changes, typically in prices, from one point in time to another. We
begin with a reference value, which is the price at some arbitrary point in time. The index numbers are
the percent increase or decrease from that reference value. If the price goes up, the index number will
be over 100. If the price goes down, the index number will be under 100. The most commonly referred
index would be the Consumer Price Index, or CPI. The CPI shows percent increase or decrease in the
prices of many goods and services, which helps determine the amount of inflation.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 100
An index published by the U.S. Bureau of Labor Statistics that shows the change in the price of many
different goods or services in the United States. It provides a measure of purchasing power.
Index Number
A way to measure the relative change in a value, usually the price of a good or service, over time. If the
index number is over 100, that means the price has increased. If the price has decreased, then the index
number will be less than 100.
Inflation
A relative increase in the price of a good or service over time. A person will need to pay more to receive
the same good or service than they did at a previous point in time.
Reference Value
An arbitrarily chosen starting value for an index. It is assigned an index number of 100.
FORMULAS TO KNOW
Index Number
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 101
Bias
by Sophia
WHAT'S COVERED
This tutorial will cover the topic of bias, specifically focusing on:
1. Bias
2. Hawthorne Effect
1. Bias
Most often, research is done accurately and with integrity. People want to get the job done right. They want to
get the answer correct. But sometimes there's something that happens systematically in the experiment or the
study that limits the accurate representation of the population being studied.
Bias, in the statistics world, is systematically misrepresenting the population. It refers to the favoring of certain
outcomes in a sample that limits our ability to draw conclusions about the population. The key word is
systematical—it's not necessarily intentional. It could be intentional, but it doesn't have to be.
A way of selecting the sample for your study such that the sample doesn't accurately reflect the population is
called selection bias. It's not good, but sometimes it can't be avoided. On the other hand, sometimes it can be
avoided, but isn't.
Publication bias occurs when researchers only want to publish the most sensational findings, or rather, only the
positive ones. Only the results that people will want to read make it to people's eyeballs, while findings deemed
boring do not.
TERMS TO KNOW
Bias
The tendency for collected data to differ from what is expected in a systematic way. Biased data can
often favor a specific group of those studied.
Selection Bias
Selecting a sample in such a way that certain subsets of the population are systematically excluded.
Publication Bias
The desire of researchers (and research publications) to only print the most sensational or interesting
articles.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 102
2. Hawthorne Effect
Often, people will behave differently if they know that they're under observation. They become a bit self-
conscious when they are observed and want to do it “right,” so they act differently.
This idea that people might change what they would typically do based on the fact they're under observation is
a type of bias called the Hawthorne Effect.
IN CONTEXT
Suppose you are in charge of a weight loss study. One group is told to take a pill every day. The other
group is also told to take a pill every day, but it doesn't have any active ingredient in it.
You instruct them not to change their behavior. You don’t want them changing the results by eating
differently or exercising more. However, these people might change their behavior based on the fact
that they know they're going to be weighed later.
Another thing to consider is when a study is based on participants volunteering their time to be a part of this
study. What may happen is that only people with a passion specific to the study may sign up, which is known as
participation bias.
Furthermore, another issue may be that the participants tell you what they think you want to hear, which is
response bias.
TERMS TO KNOW
Hawthorne Effect
People have the tendency to change their behavior when they know they are being monitored.
Response Bias
Bias that occurs when a respondent tells the interviewer "what they want to hear" or lies due to the
sensitive nature of the question.
SUMMARY
Bias has a problematic influence on many experiments and samples. Unfortunately, when bias exists,
the results received cannot be generalized to the population because they are not reliable. It’s
important to know that bias is not always intentional. It can be a systematic flaw in the sample or the
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 103
experiment, but it's not always on purpose. Selection bias happens when the sample is not truly
representative of the population to which you want to generalize the information. Publication bias is
when researchers publish only the information that they think people want to see. The Hawthorne
Effect is a type of bias that happens when people act differently, just knowing they are being observed.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Bias
The tendency for collected data to differ from what is expected in a systematic way. Biased data can often
favor a specific group of those studied.
Hawthorne Effect
People have the tendency to change their behavior when they know they are being monitored.
Publication Bias
The desire of researchers (and research publications) to only print the most sensational or interesting
articles.
Response Bias
Bias that occurs when a respondent tells the interviewer "what they want to hear" or lies due to the
sensitive nature of the question.
Selection Bias
Selecting a sample in such a way that certain subsets of the population are systematically excluded.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 104
Nonresponse and Response Bias
by Sophia
WHAT'S COVERED
This tutorial will cover the topics of nonresponse bias and response bias by focusing on:
1. Nonresponse Bias
2. Participation Bias
3. Response Bias
1. Nonresponse Bias
A nice way to think of sampling is to use a “pot of soup” analogy. You want a representative sample, right? Well,
you don't need to drink the entire pot of soup in order to figure out what's in it. You just need the right taste.
It would be like selecting all of the ingredients from the soup in a single tasting, but certain things can go wrong
with the taste test that can affect what you think is in the soup. Just like you don't really know what the
population looks like, you really don’t have a clear idea of all the ingredients in the soup. All you get is the taste,
and if you don't get the right taste, you're going to leave something out and not know exactly what's in the soup
(or population).
In terms of sampling, nonresponse means that someone selected for the sample either can't be contacted or is
unwilling to participate.
Now, nonresponse happens. It's an inevitability that you will get uncooperative people, people that don't want
to take your survey, or people who refuse to be part of your experiment. It may be that you just won't be able to
contact certain people.
The issue of nonresponse is not a problem until the people that weren't able to be contacted or refused to
participate differ substantially from the people that were in the sample. Now the sample is not representative of
the population. That is called nonresponse bias because you're not getting an accurate cross-section of
opinions. The opinions of people that you wanted to get are left out.
IN CONTEXT
A workplace wishes to survey 200 of its 1,000 employees about their workload and their stress level,
so they put 200 surveys in the workers' mailboxes. It’s likely that the people who have the biggest
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 105
workloads might get left out of the sample because they don't check their mailboxes as often as other
people. Or if they do get around to checking their mailbox, they may not complete the survey, or don't
return it, because they're so busy.
What effect might that have? The 200 respondents that completed the survey may have reported that
workload level is not that high. The only problem is that the people with the lower workloads are the
only people who turned them in, because they had the time to take it. Also, the people with the higher
workloads didn't have the time to take it, reinforcing the conclusion that the company might think the
workload level is lower than it really is.
The nonresponse rate is easy to calculate. You just subtract the number that you got back from the number that
you mailed out, and that's your nonresponse rate.
EXAMPLE Say you mailed out 100, and you only got 80 back. Well, that's 20 out of 100, or a 20%
nonresponse rate.
THINK ABOUT IT
Consider the different ways of conducting a survey, a poll, or a sample. Which of the following methods do
you think has the highest nonresponse rate?
Mail
Telephone
Face-to-face
The answer is the mail. People will either throw it away, forget to fill it out, or maybe they'll fill it out and then
forget to mail it back. This is problematic because when the United States takes its census of everyone in
the country, it does so by mail. Sometimes they have to do follow-ups.
In samples with high rates of nonresponse, follow-ups typically are needed. Suppose you started with a mailing.
You might need to follow up by calling them at home. If you can't reach them by calling them at home, you
might need to follow up by coming directly to their house.
Sometimes, even when they are contacted, someone will refuse to participate. Follow-ups like this might be
more necessary in some areas of the country than others because different areas of the country have different
rates of nonresponse.
TERMS TO KNOW
Nonresponse
Nonresponse is a lack of response from people you've selected. It affects the ability to draw
conclusions from your sample.
Nonresponse Bias
Bias that occurs when the people who were unable to be reached or unwilling to participate in a
sample have substantially different opinions than the people who were included in the sample, resulting
in a misrepresentation of the population.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 106
2. Participation Bias
On the other end of the spectrum is when people are excessively passionate about a topic and they’re eager to
participate. The people who raise their hand to participate are volunteering their time because they have a
strong opinion about the topic at hand. Participation bias happens when people participate because they have
strong opinions about the topic, or they’re ambivalent because they are only participating because they are
getting paid to participate.
EXAMPLE Suppose you need to gather information on an upcoming election, and you ask people to
participate in a focus group. In your group, you find that you have a group in strong support of the
Democratic party, and you have a group in strong support of the Republican party, and no one in the
middle.
To correct this, you decide you’re going to pay participants $20 for their time. Now your group is filled with
people who will simply tell you want they think you want to hear, which invites participation bias.
TERM TO KNOW
Participation Bias
Bias that occurs when participation in a study is voluntary. People who feel strongly may be the only
participants.
3. Response Bias
Response bias is when people's answers are influenced. Remember the pot of soup analogy? When you get a
representative sample, that's like getting a little taste of everything in the soup. However, things can go wrong,
and you don't get the right taste of the soup.
Response bias can occur if the wording of the question is unclear to the respondent, if a respondent is
uncomfortable due to the sensitive or personal nature of the questions, or if the respondent feels like the
questioner is implying that the question has a “correct” response. That's also called social desirability bias.
IN CONTEXT
On April 20, 1993, the New York Times published an article on a survey conducted by the Roper
Organization on behalf of the Jewish American Community about the soon-to-be opened Holocaust
Museum in Washington, DC.
The newspaper reported that 22%, an astounding number of adults surveyed, expressed some doubt
as to whether the Holocaust had actually occurred. The actual question that was presented to people
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 107
was:
“Does it seem possible, or does it seem impossible to you, that the Nazi extermination of the Jews
never happened?”
This seems to be a fairly straightforward question, but there was a big problem with it, and it caused
response bias. The problem is that the question contained a double negative, which is confusing.
Saying it is impossible that it never happened is the same as someone saying they are certain that it
did happen, but the question doesn't clearly read that way.
The good thing is that 1 year later, the question was revised, and it became clearer. The new question
stated:
“Does it seem possible to you that the Nazi extermination of the Jews never happened, or do you
feel certain that it happened?”
With this new, clearer question, the question clearly distinguishes between what the two options are
—“Does it seem possible?” or “Do you feel certain?” With the two options clearly defined, less than 2%
of individuals were unsure as to whether it was real or not. This provided a more accurate
interpretation of what the American public felt.
Therefore, unclear questions can lead to an inaccurate representation due to response bias. The other scenario
in which this can occur is when people will answer a question because they are either ashamed, or they think
that there's a “right” answer that someone is fishing for.
There are certain topics that are particularly sensitive and might make a person want to lie.
This may result in many people saying they've never used drugs, whether they
Drugs actually have or not. Even if there's no consequence and the survey is anonymous,
they'll still say they've never used drugs when, in fact, they have.
Criminal history Participants might say they don't have one, even if they do.
Sexual behavior This might cover topics of a highly sensitive and personal nature.
There's an implied right answer; people don't want to say that they're racially
Racial prejudice
prejudiced.
People will report it as being higher than it actually is if they're of low-income status,
or possibly more surprisingly, people will report it as lower than it really is if they're of
Income
very high-income status. A lot of people don't want to be showy about their wealth,
and so they'll try and come up with a more reasonable number, in their eyes.
How does this affect what we think about the population? How does this affect the “soup”?
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 108
It's like taking a sample of the soup and only tasting the things that you want to taste. Maybe you don't like
beans, and so you just sort of ignore the fact that they're in there. You don't get the overall flavor of what's
supposed to happen. It's the same thing with response bias. It doesn't give you the right overall interpretation of
what things the population are supposed to be like.
TERM TO KNOW
Response Bias
Bias that occurs when either (1) the question is poorly worded so that certain responses are over-
represented, or (2) the respondent is confused by the question or feels like they should lie due to the
sensitive nature of the question.
SUMMARY
Nonresponse bias occurs when people who are selected for the sample can't participate, either
because you can't find them, or because they're actively refusing. The biggest problem is that if you
have high rates of nonresponse, it might give you an inaccurate representation of what's going on with
your population. You won't be able to use your sample to draw an inference about your population.
Participation bias occurs when participation in a study is voluntary, and therefore, people who feel
strongly about a given topic may be the only participants. Response bias occurs one of two ways:
Either a respondent doesn't understand the question and so gives an answer that he wasn't intending,
or the respondent wants to give a supposedly correct answer to the questioner. Both of these can be
inaccurate representations of what actually is the truth about the population. Response bias is a tough
thing to get rid of, especially when it is unintentional and surrounds the wording of the questions.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Nonresponse
Nonresponse is a lack of response from people you've selected. It affects the ability to draw conclusions
from your sample.
Nonresponse Bias
Bias that occurs when the people who were unable to be reached or unwilling to participate in a sample
have substantially different opinions than the people who were included in the sample, resulting in a
misrepresentation of the population.
Participation Bias
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 109
Bias that occurs when participation in a study is voluntary. People who feel strongly may be the only
participants.
Response Bias
Bias that occurs when either (1) the question is poorly worded so that certain responses are over-
represented, or (2) the respondent is confused by the question or feels like they should lie due to the
sensitive nature of the question.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 110
Selection and Deliberate Bias
by Sophia
WHAT'S COVERED
This tutorial will cover the topics of selection, deliberate, and unintentional bias. These may all impact
the selection of the right group of people for your sample, so it’s very important to be aware of them
when attempting to generalize findings. Our discussion breaks down as follows:
1. Selection Bias
2. Random Digit Dialing
3. Deliberate Bias
4. Unintentional Bias
1. Selection Bias
You may recall that sampling is like a pot of soup. Selecting a little bit of each ingredient for the soup is like
obtaining a representative sample for an experiment. However, things can go wrong with the taste test, which
may limit the ability to draw conclusions about the pot of soup as a whole.
Selection bias is also called undercoverage bias. It occurs when a significant subset of the population is left out
of the sample. This is not necessarily intentional but rather occurs when they were systematically ignored by
whoever was taking the sample.
IN CONTEXT
In 2008, almost every poll showed Barack Obama leading by at least five percentage points leading
up to the New Hampshire presidential primary. All of these were based on random digit dialers calling
a random sample of New Hampshire households. It was a well-done survey of all accounts.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 111
However, what happened was that Clinton gained some support in the last few days. Mainly, a lot of
college students ended up coming out in support of Hillary Clinton in the last days when people were
expecting all college students to come out in support of Obama.
Because a lot of the college students were from out of state, they weren't actually New Hampshire
residents. For that reason, they were not counted and, as a result, the sample got every prediction
wrong, and Clinton ended up winning.
TERM TO KNOW
Selection Bias
A bias that results from systematically excluding certain subsets of the population from the sample. It is
not necessarily intentional.
The biggest advantage of using random digit dialers is that they can reach mobile phones and unlisted
numbers that you wouldn't be able to obtain using a phone book. So, it evens the playing field a bit since
anyone can be selected for that sample as long as the phone number is within that particular area code.
THINK ABOUT IT
How does selection bias affect what we think is in the soup? Imagine that certain ingredients were located
only in certain locations in the pot. Maybe noodles sank to the bottom. If you tasted only from the top, it
doesn't matter how big that taste is. If you missed the noodles, you wouldn't even know they were there.
That's the same as dealing with selection bias. Because you didn't select the representative group of
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 112
ingredients from the population, you don't get the right idea of what's going on. It limits your ability to
generalize your findings to the general population.
TERM TO KNOW
3. Deliberate Bias
Deliberate bias is exactly what it sounds like: It's a bias that's done on purpose. While deliberate bias doesn’t
happen very often, it can occur when there's a conflict of interest between the people performing research and
the people funding the research—who are usually the ones benefiting from that research.
Typically, deliberate bias is motivated by an interest unrelated to the integrity of whatever you’re researching.
Most research is done with integrity, but when personal prestige, the advancement of some ideology, or money
get in the way, it’s harder to prove that intentions are pure.
Politics can be an industry ripe for deliberate bias. Perhaps people call with a poll, but the survey includes a
leading question that causes the person to respond in a certain way. When this is done, it's called “push
polling,” and it’s highly suspect.
IN CONTEXT
Deliberate bias can happen in other areas too—even the medical field. Suppose there are two drugs:
Drug A and Drug B. The company for Drug B posed the following leading question:
Based on how this question was posed, Drug B would be more likely to be chosen.
But there’s more. They've put a thought into the participant’s head that Drug A is linked to cancer. Did
they ever explicitly say that? No, they said if it was linked to cancer. However, now they've placed the
association in the participant's mind. Subconsciously, they're beginning to steer consumers away from
Drug A and towards Drug B.
If a drug company funds a study to determine if its latest drug is effective, the researchers stand to gain a lot of
money and prestige for having tested the drug, if proven effective. For this reason, they might not be the best
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 113
choice to test the drug.
IN CONTEXT
An environmental research group is hired by a real estate developer to investigate the effects of a
new building. If the results are favorable, they might get another contract with that real estate
developer. If the environmental research group doesn’t come through with a favorable interpretation,
another group will, and that group will get the next contract.
The environmental research group wants to be hired by the developer on another project, so there is
a conflict of interest.
TERM TO KNOW
Deliberate Bias
The purposeful misrepresentation of data for the purpose of advancing an agenda.
4. Unintentional Bias
Unintentional bias occurs when there is simply an error in the design of the study. Two types of unintentional
bias include:
Response bias, which involves the wording of questions or refers to people feeling like they have to lie.
Selection bias, which involves how the sample was selected, such as when people are not included in the
selection process, even though they make up a portion of the population.
Both are simply errors with no hidden agenda. They're not intentional and are not meant to purposely steer the
direction of the respondents.
TERM TO KNOW
Unintentional Bias
Bias that is not purposeful. It exists because of errors in the design of the study.
SUMMARY
Selection bias occurs when some subset of the population is left out. It might be intentional or
unintentional. Since some section of the population is left out, the coverage is lacking, which is why
selection bias is also known as “undercoverage.” Random digit dialing is a great tool to use since it
helps extend coverage to mobile phones and unlisted numbers. Most of the time, deliberate bias—a
bias that is done on purpose—is not typically a cause of concern. Sometimes, however, people with
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 114
personal interests, like the advancement of an ideology or financial gain, steer results towards
outcomes that are favorable to them. Most of the time, research is done with integrity. When bias does
occur, it is accidental, which is called unintentional bias.
Good luck!
TERMS TO KNOW
Deliberate Bias
The purposeful misrepresentation of data for the purpose of advancing an agenda.
Selection Bias
A bias that results from systematically excluding certain subsets of the population from the sample. It is
not necessarily intentional.
Unintentional Bias
Bias that is not purposeful. It exists because of errors in the design of the study.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 115
Convenience & Self-Selected Samples
by Sophia
WHAT'S COVERED
This lesson will explain two types of samples: convenience and self-selected samples. Our discussion
breaks down as follows:
1. Representative Samples
2. Nonrepresentative Samples
2a. Convenience Samples
2b. Self-Selected Samples
1. Representative Samples
One of the things that we know about sampling is that it's important for samples to be representative of the
population, also known as a representative sample. What we mean by that is when we take our sample—which
is a subset of a larger population—we want this sample to behave just like the population would if we sampled
them all.
BIG IDEA
The sample should represent the group/population at large, so it’s important individuals are selected
carefully for the sample. That way, accurate information will be gained and can be used to describe the
group/population at large.
The goal is to generalize what is found in the sample and apply it to the people outside of the box, or the
population.
TERM TO KNOW
Representative Sample
A sample that accurately reflects the population.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 116
2. Nonrepresentative Samples
The two methods analyzed in this section have major flaws—these two designs do not result in representative
samples. They are conducted often, so it’s important for you to recognize them.
IN CONTEXT
Suppose there is a crowd of people at a mall and there is one guy with a clipboard, and he wants
some data. He might take the people nearest to him, and say, “Hey, would you like to take my survey,
please?”
The people he asks might be representative of the population, but they might not. They all simply
happen to be at the same place at the same time. This means they might have some similarities that
could make them not representative of the larger population. The risk of them not representing the
group/population at large is too high.
EXAMPLE If you ask people about their spending habits, and they all happen to be shopping in the
headphones section, that probably means they have similar ideas about how they should spend their
money.
TERM TO KNOW
Convenience Sample
A sample that is easily obtained. It is often not representative of the population.
EXAMPLE If your focus group is about politics, you might get only the very, very liberal people or the
very, very conservative people. You might get the most extreme viewpoints but none of the viewpoints in
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 117
the middle. There are also a lot of people who are ambivalent about politics. They don't really care, but they
want to get paid if this is a sample that offers compensation or another type of reward like free lunch.
TERM TO KNOW
SUMMARY
Representative samples are important if we want to accurately generalize our findings to the
population. Flawed sampling methods, however, can result in nonrepresentative samples.
Convenience samples render people who are simply in the vicinity and happen to be at the same place
at the same time. Self-selected samples are also called “voluntary response” samples and tend to elicit
either strong opinions or no opinion at all.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Convenience Sample
A sample that is easily obtained. It is often not representative of the population.
Representative Sample
A sample that accurately reflects the population.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 118
Random and Systematic Errors
by Sophia
WHAT'S COVERED
This tutorial will compare random errors vs. systematic errors. Our discussion breaks down as follows:
1. Random Errors
2. Systematic Errors
1. Random Errors
Random errors are exactly that: random. They can simply occur through no fault of the person taking the
sample. When a sample is taken from a larger population, the results are unknown, meaning that it’s unclear if
the results will accurately represent exactly what the population looks like.
IN CONTEXT
Suppose there were 100 individuals, which we will consider the population. Twenty of them were
college students. You select five people out of the overall 100 for a sample. What would you expect to
happen?
You would expect that 20% of the population are college students, which is one out of every five
people. So, you would probably expect one individual within your sample of five people to be a
college student.
However, that doesn't always happen. You might not get any college students, or all five of them may
be college students. Just because you expect to get one doesn't mean that will actually happen. Why
not?
Let’s say that the individuals with numbers 1–20 are the college students. Numbers 21–100 are
individuals not in college. Using a random number generator, you might get a simple random sample
that looks like this:
Sample Percentage
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 119
One out of five of those is a college student, which is 20%.
Sample Percentage
However, you might get a simple random sample that looks like this:
Sample Percentage
Here, the second person, number 5, and the fifth person, number 20, are college students, out of 100
individuals in the population. That’s 40%. What went wrong? Nothing went wrong—it’s just that random
errors happen sometimes.
Random error occurs when the sample, just by chance, doesn't match up perfectly with the population. Random
error is not a mistake that is correctable; it is simply something that happens when sampling randomly. While it
can’t be corrected or avoided completely, the impact can be minimized by increasing the sample size or by
taking multiple samples of equal size. The larger the group, the better the chances are that a representative
group will be obtained.
EXAMPLE Recall the example from above. Suppose that 10 individuals from the group of 100 were
chosen instead of five. Two college students would be expected to make it into the sample. So, if the
sample was off by one, it reduces the impact since at least one college student would be represented.
TERM TO KNOW
Random Error
When the resulting value obtained from the sample does not match the value from the population
simply by chance. This is not a mistake but is inherent in the variability in sampling.
2. Systematic Errors
Now, by contrast, systematic errors are mistakes. Systematic errors are due to flaws in the design.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 120
IN CONTEXT
Suppose a school board wants to estimate how many students are eligible for free or reduced lunch. If
you have an undercoverage bias, or selection bias, your sample may include people from a poorer
neighborhood that didn't respond to a questionnaire that was sent out. Perhaps their parents were
working nights and didn’t have time to complete the survey.
Therefore, the board may underestimate the true number of students requiring free and reduced
lunch. This type of error cannot be remedied by increasing the sample size.
EXAMPLE A child has a growth chart in his room, and his parents mistakenly put it up above the
baseboard—an extra 2 inches from the floor. This is going to result in the child thinking he’s 2 inches shorter
than he actually is, an example of measurement bias, which is systematically wrong.
TERMS TO KNOW
Systematic Error
When the resulting value obtained from the sample does not match the value from the population as a
result of an incorrect measurement or bias. This is a mistake made by the researcher.
Selection Bias
A bias that occurs when certain groups are systematically left out of the sample. This is a systematic
error.
Measurement Bias
A mistake in the measurements taken in the study. This is a systematic error.
SUMMARY
Random errors occur when the sample selected doesn't match up with the population. It cannot be
controlled, but using a larger sample will lessen the effect. Conversely, systematic errors result in
wrong answers or wrong values in your sample, due to some kind of bias or error with your
measurement. Increasing the sample size will not fix the issue. When a systematic error occurs, you
might as well just start over because there's no rescuing poorly collected data!
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 121
Measurement Bias
A mistake in the measurements taken in the study. This is a systematic error.
Random Error
When the resulting value obtained from the sample does not match the value from the population simply
by chance. This is not a mistake, but is inherent in the variability in sampling.
Selection Bias
A bias that occurs when certain groups are systematically left out of the sample. This is a systematic error.
Systematic Error
When the resulting value obtained from the sample does not match the value from the population as a
result of an incorrect measurement or bias. This is a mistake made by the researcher.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 122
Margin of Error
by Sophia
WHAT'S COVERED
1. Margin of Error
You may have seen something in your local newspaper stating that, for example, a political candidate leads the
field by 5%, and that there is a 3% margin of error in the poll. What does this mean?
When surveys are done, collecting the right amount of data is important to ensure the answer is correct.
Samples are often reported with something called a margin of error, meaning that the results may be off by a
little bit, though it can be estimated by how much. It explains to the reader that the right answer is not 100%
accurate, but it is a close estimate.
IN CONTEXT
Suppose you are an administrator of a school, and you need to determine the overall percentage of
left-handed students. Maybe 10% of students in the school are left-handed, but when you take a
sample, even though you were diligent about the way data was collected, you get 8%. The answer
was not accurate. What happened?
It's possible that the data obtained was not exactly the same as what the population would have
obtained. Maybe only 8% of left-handed people were in the sample, even though the population
actually contains 10% who are left-handed. You didn't do anything wrong, but samples might be
inherently off the mark due to the random selection process.
TERMS TO KNOW
Margin of Error
An amount by which we believe our sample's mean may deviate from the true mean of the population.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 123
Estimate
The mean value obtained from the sample. If the sample was well collected, the estimate should be
reasonably close to the true value.
2. Confidence Interval
The confidence interval uses both the estimate and margin of error. When we combine these two parts, it gives
us a range of possible values that our estimate can be.
This confidence level tells us how sure we are that our interval contains the actual population value or how sure
we are that our sample falls in that range.
IN CONTEXT
Suppose a newspaper polled 500 voters, and 48% responded that they were going to vote for
Candidate X in the upcoming election. The newspaper might print a margin of error along with that
48% mark; perhaps they use four percentage points as their margin of error. It's not particularly
important how this 4% was calculated, but it is important to note that a margin of error was reported
along with the percent value.
What does this 4% margin of error mean? It means the researchers are pretty confident that the true
amount of people that will vote for Candidate X is within 4% of 48%, which means that it could be as
low as 44%, or as high as 52%, or anywhere in between. This idea of creating some wiggle room on
either side of 48% is the confidence interval.
Suppose on election day, 46% of the people voted for Candidate X. Since this falls into the range of
44% to 52%, it is a close enough estimate to the right answer.
THINK ABOUT IT
What happens to the margin of error as the sample size increases? Will the margin of error go up, down, or
stay about the same?
As the sample size goes up, the margin of error goes down because a larger sample size gives a more accurate
portrait of the population. What’s happening is that you cast a wider net to include people that may be closer to
representing the actual population.
If you had a sample size of four people and you want to generalize the findings to a population of 200 people,
it’s unlikely that just those four people have enough of the characteristics to represent the population.
However, when the sample size is increased, you get closer to achieving a representative sample, which means
the confidence interval can be lower; in other words, the higher the sample size, the less wiggle room is
needed on each side of the measurement.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 124
TERM TO KNOW
Confidence Interval
A range of potential values that the true value could be. It is obtained by adding and subtracting the
margin of error from the value in the sample.
SUMMARY
Most statistical results are reported alongside a margin of error, which is an amount by which the
sample's mean may deviate from the true mean of the population. If the data is well collected, then it's
likely that the true population value is within the confidence interval created by the reported value, plus
or minus the margin of error. It's a bad idea to compare two values within the same confidence interval
since both would be accurate enough to be correct. That would be a statistical dead heat.
Good luck!
Source: THIS TUTORIAL WAS AUTHORED BY JONATHAN OSTERS FOR SOPHIA LEARNING. PLEASE SEE OUR
TERMS OF USE.
TERMS TO KNOW
Confidence Interval
A range of potential values that the true value could be. It is obtained by adding and subtracting the
margin of error from sample mean.
Estimate
The mean value obtained from the sample. If the sample was well collected, the estimate should be
reasonably close to the true value.
Margin of Error
An amount by which we believe our sample's mean may deviate from the true mean of the population.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 125
Terms to Know
Absolute Change
The raw increase or decrease in the value of a variable
Accuracy
The extent to which the values, when considered all together, center around the correct
value for a variable.
Available Data
Data collected by some other entity—a government organization or private company.
Bias
The tendency for collected data to differ from what is expected in a systematic way. Biased
data can often favor a specific group of those studied.
Blinding
The practice of making sure that certain individuals do not know which subjects are receiving
which treatment.
Census
Using the entire population to obtain data.
Closed Question
A question type with only so many different answer choices.
Cluster Sample
A sampling method where the population is separated into groups, typically geographically,
and a random selection of clusters is made. Each individual in the cluster becomes part of the
sample.
Clusters
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 126
Smaller subgroups of the population, not necessarily similar in any way besides all being
together in one place, making the individuals easier to sample together.
Confidence Interval
A range of potential values that the true value could be. It is obtained by adding and
subtracting the margin of error from sample mean.
Confounding
Occurs when the effects of the treatments, if any, are indistinguishable from the potential
effects of some other variable which was unaccounted for.
Confounding Variable
A variable which was not accounted for in a study, which limits the conclusions that the study
can draw.
Continuous Data
Data that can take any value within an interval.
Control
The principle of experimental design that requires that other variables which may confound
the experiment be held constant between the treatment groups, so that any differences in
the groups can be attributed to the different treatments.
Convenience Sample
A sample that is easily obtained. It is often not representative of the population.
Data
Information used in a study to answer a statistical question.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 127
Deliberate Bias
The purposeful misrepresentation of data for the purpose of advancing an agenda.
Descriptive Statistics
Using only the information at hand to describe the selected group of individuals.
Discrete Data
Data that can only take so many different values.
Double-Blind Experiment
An experiment where neither the subjects, nor anyone in contact with them, has any
knowledge of which subjects are receiving which treatment.
Estimate
The mean value obtained from the sample. If the sample was well collected, the estimate
should be reasonably close to the true value.
Experiment
A type of study where researchers impose treatments on the participants or experimental
units.
Experimental Design
The way in which an experiment is carried out. A good design has key elements of
randomization, replication, and control.
Experimental Unit
An animal or thing involved in an experiment.
Explanatory Variable
A variable that we believe is predictive of something else. An increase in this variable will
correspond to an increase or decrease in some other variable.
Hawthorne Effect
People have the tendency to change their behavior when they know they are being
monitored.
Index Number
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 128
A way to measure the relative change in a value, usually the price of a good or service, over
time. If the index number is over 100, that means the price has increased. If the price has
decreased, then the index number will be less than 100.
Inferential Statistics
Using the information at hand to make a larger, more general statement about the entire
population of individuals.
Inflation
A relative increase in the price of a good or service over time. A person will need to pay more
to receive the same good or service than they did at a previous point in time.
Margin of Error
An amount by which we believe our sample's mean may deviate from the true mean of the
population.
Matched-Pair Design
An experimental design where two subjects who are similar with respect to variables that
could affect the outcome of the experiment are paired together, then one of them is assigned
to one treatment and one is assigned to the control. This can also be done by assigning each
subject to both treatments, where each subject acts as their own matched pair.
Measurement Bias
A mistake in the measurements taken in the study. This is a systematic error.
Multi-Stage Sampling
A sampling design which combines elements of cluster sampling, stratified random sampling,
and simple random sampling. It "zooms in" on smaller areas to sample so that sampling
becomes more feasible.
Nominal Data
Categorical data with qualities that cannot be ordered or ranked.
Nonresponse
Nonresponse is a lack of response from people you've selected. It affects the ability to draw
conclusions from your sample.
Nonresponse Bias
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 129
Bias that occurs when the people who were unable to be reached or unwilling to participate
in a sample have substantially different opinions than the people who were included in the
sample, resulting in a misrepresentation of the population.
Observational Study
A type of study where researchers can observe the participants but not affect the behavior or
outcomes in any way.
Open Question
A question type with no answer choices; the respondent can choose what he or she wants to
say to answer the question.
Ordinal Data
Categorical data with qualities that can be ordered or ranked.
Participation Bias
Bias that occurs when participation in a study is voluntary. People who feel strongly may be
the only participants.
Percent Change
A relative increase or decrease in a percent value
Percentage Points
An absolute increase or decrease in a percent value.
Placebo
An inert drug or treatment given to the control group. It has no active ingredient in it.
Placebo Effect
The observed phenomenon whereby certain individuals will exhibit a desired response even
when taking a placebo which contains no active ingredient.
Population
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 130
The entire set of individuals from which to sample.
Precision
The extent to which the values are very close to each other, even if they are not near the
correct value.
Prospective Study
A study that begins by selecting participants, then tracks them and keeps data on the
subjects as they go into the future.
Publication Bias
The desire of researchers (and research publications) to only print the most sensational or
interesting articles.
Random Error
When the resulting value obtained from the sample does not match the value from the
population simply by chance. This is not a mistake, but is inherent in the variability in
sampling.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 131
A method of collecting a sample to select random numbers corresponding to individuals in
the population. Each is assigned a number, which is then selected from the table.
Random Sample
A sample that has been selected in a manner where every member of the population has
some predetermined chance of being selected for the sample.
Random Selection
The method of obtaining a random sample.
Randomization
The principle of experimental design that requires that the subjects/experimental units be
assigned to groups using some random process. This ensures that the two groups are
roughly equal prior to assigning treatments.
Raw Data
Data that is unorganized, unprocessed, and not summarized. Typically, this is data that is not
already available.
Reference Value
An arbitrarily chosen starting value for an index. It is assigned an index number of 100.
Relative Change
The percent increase or decrease in the value of a variable.
Replication
Repeating the experiment on multiple subjects/experimental units. This principle of
experimental design states that a larger experiment with more subjects/experimental units
will allow us to more clearly see differences between the treatments.
Representative Sample
A sample that accurately reflects the population.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 132
Response Bias
Bias that occurs when a respondent tells the interviewer "what they want to hear" or lies due
to the sensitive nature of the question.
Response Variable
A variable that is affected by the explanatory variable.
Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand
how they became the way they are in the present.
Sample/Sampling
A subset of the population. There are many ways to select a sample.
Selection Bias
Selecting a sample in such a way that certain subsets of the population are systematically
excluded.
Single-Blind Experiment
An experiment where either the subjects have no knowledge of which subjects are receiving
which treatment, or people in contact with the subjects have no knowledge of which subjects
are receiving which treatment, but not both.
Statistical Study
A way to collect information from individuals.
Statistics
The study of collecting, analyzing, interpreting, and presenting information.
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 133
A random sampling method where individuals are separated into homogenous groups, then
simple random samples are taken within each group.
Stratum/Strata
The homogeneous groups in a stratified random sample. All individuals in each stratum have
something in common, and we would like to see how that affects the outcome of the sample.
Subjects/Participants
The people or things being examined in an observational study.
Survey Design
The way the survey is set up. This deals with the wording of questions and answer choices.
Survey/Sample Survey
A data collection tool that individuals in a study can fill out and return to the researcher.
Systematic Error
When the resulting value obtained from the sample does not match the value from the
population as a result of an incorrect measurement or bias. This is a mistake made by the
researcher.
SUMMARY
Treatment
Something the researchers administer to the subjects or experimental units.
Unintentional Bias
Bias that is not purposeful. It exists because of errors in the design of the study.
Variable
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 134
Any attribute or number that can be measured about individuals in a study.
Variable of Interest
Any variable which we need to know about in the context of a study.
Variables of Interest
The variables the survey wishes to measure about those taking the survey.
Formulas to Know
Absolute Change
Index Number
Relative Change
© 2024 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 135