Using and Handling Data
Data Index
Probability and
Statistics Index
Graphs Index
What is Data?
What is Data?
Discrete and Continuous Data
Advanced: Analog and Digital Data
How to Show Data
Bar Graphs
Pie Charts
Dot Plots
Line Graphs
Scatter (x,y) Plots
Pictographs
Histograms
Frequency Distribution
Stem and Leaf Plots
Cumulative Tables and Graphs
Graph Paper Maker
Surveys
How to Do a Survey
Survey Questions
Showing the Results of a Survey
Accuracy and Precision
Activity: Asking Questions
Activity: Improving Questions
Probability and Statistics
Measures of Central Value
Finding a Central Value
Calculate the Mean Value and The Mean Machine
Find the Median Value
Find the Mode or Modal Value
Activity: Averages Brain-Teaser
Calculate the Mean from a Frequency Table
Advanced: Mean, Median and Mode from Grouped Frequencies
Weighted Mean
Measures of Spread
The Range
Quartiles and the Interquartile Range
Percentiles
Mean Deviation
Standard Deviation
Standard Deviation Calculator
Standard Deviation Formulas
Comparing Data
Univariate and Bivariate Data
Scatter (x,y) Plots
Outliers
Correlation
Probability
Probability
The Probability Line
The Spinner
The Basic Counting Principle
Relative Frequency
Activities:
An Experiment with a Die
An Experiment with Dice
Dropping a Coin onto a Grid
Buffon's Needle
Random Words
Lotteries
Events
Complement
Probability: Types of Events
Independent Events
Dependent Events: Conditional Probability
Tree Diagrams
Mutually Exclusive Events
False Positives and False Negatives
Shared Birthdays
Combinations and Permutations
Combinations and Permutations
Combinations and Permutations Calculator
Random Variables
Random Variables
Random Variables - Continuous
Random Variables - Mean, Variance and Standard Deviation
The Binomial Distribution
Quincunx and Quincunx Explained
The Binomial Distribution
The Normal Distribution
Normal Distribution
Standard Normal Distribution Table
Skewed Data
What is Data?
Data is a collection of facts, such as numbers, words, measurements, observations or even just
descriptions of things.
Qualitative vs Quantitative
Data can be qualitative or quantitative.
Qualitative data is descriptive information (it describes something)
Quantitative data, is numerical information (numbers).
And Quantitative data can also be Discrete or Continuous:
Discrete data can only take certain values (like whole numbers)
Continuous data can take any value (within a range)
Put simply: Discrete data is counted, Continuous data is measured
Example: What do we know about Arrow the Dog?
Qualitative:
He is brown and black
He has long hair
He has lots of energy
Quantitative:
Discrete:
o He has 4 legs
o He has 2 brothers
Continuous:
o He weighs 25.5 kg
o He is 565 mm tall
To help you remember think "Quantitative is about Quantity"
More Examples
Qualitative:
Your friends' favorite holiday destination
The most common given names in your town
How people describe the smell of a new perfume
Quantitative:
Height (Continuous)
Weight (Continuous)
Petals on a flower (Discrete)
Customers in a shop (Discrete)
Collecting
Data can be collected in many ways. The simplest way is direct observation.
Example: you want to find how many cars pass by a certain point on a road in a 10-minute
interval.
So: stand at that point on the road, and count the cars that pass by in that interval.
We collect data by doing a Survey.
Census or Sample
A Census is when we collect data for every member of the group (the whole "population").
A Sample is when we collect data just for selected members of the group.
Example: there are 120 people in your local football club.
You can ask everyone (all 120) what their age is. That is a census.
Or you could just choose the people that are there this afternoon. That is a sample.
A census is accurate, but hard to do. A sample is not as accurate, but may be good enough, and is
a lot easier.
Language
Data or Datum?
The singular form is "datum", so we say "that datum is very high".
"Data" is the plural so we say "the data are available", but it is also a collection of facts, so "the
data is available" is fine too.
Make your own Graphs
Explore the wonderful world of graphs. Create your own, and see what different functions
produce. Get to understand what is really happening.
What type of Graph do you want?
Function Grapher Equation Grapher
and Calculator
Make a Bar Graph,
Print or Save Blank Graph Paper
Line Graph or Pie Chart
You can explore ...
... the properties of
a Straight Line
Graph
... the properties of
a Quadratic
Equation Graph
... Cartesian
Coordinates
And also:
Try some Sample Graphs
Make up a function like you
use a calculator, then graph
the result
Make a Bar Graph (old
version)
Discrete and Continuous Data
Data can be Descriptive (like "high" or "fast") or Numerical (numbers).
And Numerical Data can be Discrete or Continuous:
Discrete data is counted,
Continuous data is measured
Discrete Data
Discrete Data can only take certain values.
Example: the number of students in a class (you can't have half a student).
Example: the results of rolling 2 dice:
can only have the values 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12
Continuous Data
Continuous Data can take any value (within a range)
Examples:
A person's height: could be any value (within the range of human heights), not just
certain fixed heights,
Time in a race: you could even measure it to fractions of a second,
A dog's weight,
The length of a leaf,
Lots more!
Analog and Digital
Analog: something physical with continuous change.
Digital: made up of numbers.
Arrow Barks!
Let's record him barking:
Arrow's bark is analog. It is actual pressure waves in the air, so it is physical with
continuous change.
Continuous change: changes smoothly ... no sudden breaks.
And the microphone converts that pressure into an electrical signal. It is stilll analog (the
electricity is physical, and has continuous change).
But when it gets to your computer or phone it gets
converted to digits!
Thousands of times a second the analog
signal is measured by special electronics ...
and is then saved as numbers.
So the "sound" is now "12, 25, 39, 52, 68, 71, 78, 82, 82, 79, 70, 59, ..." (in fact it would
be in binary, so would be something like "000011000001100100100111...")
It is now digital!
Notice the digital data has sudden jumps up and down ... it does not change
continuously.
It is Discrete Data: that means it can only be certain values (such as 1, 2, 3, etc).
Digital data is very easy for computers and phones to use. It can be saved, shared
electronically, sent all over the world quickly and more.
How can we hear Digits?
Easy! The numbers are used to control the size of an electrical signal, which is analog.
The electricity can be sent to a
speaker ...
... to make sound waves again!
Digital becomes Analog
It should sound very much like the original bark (but not perfectly so!)
Digital Pictures
A similar thing happens when you take a picture.
Light (which is analog) gets projected onto a grid of millions of little sensors inside the
camera:
The camera measures the light at each point and produces numbers.
The picture is now digital!
So the "picture" is now "A1DDF9, ADE3FF, B5E7FE, AFE4F8, ...", which are
hexadecimal color numbers, (that are used internally in binary, so would be something
like "101000011101110111111001...")
Look really closely at a digital picture ... it is made up of millions of little squares called
"pixels":
Each "pixel" is made using a hexadecimal color number.
Digital IS Numbers
So digital pictures, music, videos etc are actually stored on your computer or phone as
numbers.
Numbers rule!
Data Collection Methods
To derive conclusions from data, we need to know how the data were collected; that is, we need
to know the method(s) of data collection.
Methods of Data Collection
For this tutorial, we will cover four methods of data collection.
Census. A census is a study that obtains data from every member of a population. In most
studies, a census is not practical, because of the cost and/or time required.
Sample survey. A sample survey is a study that obtains data from a subset of a
population, in order to estimate population attributes.
Experiment. An experiment is a controlled study in which the researcher attempts to
understand cause-and-effect relationships. The study is "controlled" in the sense that the
researcher controls (1) how subjects are assigned to groups and (2) which treatments each
group receives.
In the analysis phase, the researcher compares group scores on some dependent variable.
Based on the analysis, the researcher draws a conclusion about whether the treatment
( independent variable) had a causal effect on the dependent variable.
Observational study. Like experiments, observational studies attempt to understand
cause-and-effect relationships. However, unlike experiments, the researcher is not able to
control (1) how subjects are assigned to groups and/or (2) which treatments each group
receives.
Data Collection Methods: Pros and Cons
Each method of data collection has advantages and disadvantages.
Resources. When the population is large, a sample survey has a big resource advantage
over a census. A well-designed sample survey can provide very precise estimates of
population parameters - quicker, cheaper, and with less manpower than a census.
Generalizability. Generalizability refers to the appropriateness of applying findings from
a study to a larger population. Generalizability requires random selection. If participants
in a study are randomly selected from a larger population, it is appropriate to generalize
study results to the larger population; if not, it is not appropriate to generalize.
Observational studies do not feature random selection; so generalizing from the results of
an observational study to a larger population can be a problem.
Causal inference. Cause-and-effect relationships can be teased out when subjects are
randomly assigned to groups. Therefore, experiments, which allow the researcher to
control assignment of subjects to treatment groups, are the best method for investigating
causal relationships.
Test Your Understanding
Problem
Which of the following statements are true?
I. A sample survey is an example of an experimental study.
II. An observational study requires fewer resources than an experiment.
III. The best method for investigating causal relationships is an observational study.
(A) I only
(B) II only
(C) III only
(D) All of the above.
(E) None of the above.
Solution
The correct answer is (E). Unlike an experiment, a sample survey does not require the researcher
to assign treatments to survey respondents. Therefore, a sample survey is not an experimental
study. An observational study may or may not require fewer resources (time, money, manpower)
than an experiment. The best method for investigating causal relationships is an experiment - not
an observational study - because an experiment features randomized assignment of subjects to
treatment groups.
What Are Variables?
In statistics, a variable has two defining characteristics:
A variable is an attribute that describes a person, place, thing, or idea.
The value of the variable can "vary" from one entity to another.
For example, a person's hair color is a potential variable, which could have the value of "blond"
for one person and "brunette" for another.
Qualitative vs. Quantitative Variables
Variables can be classified as qualitative (aka, categorical) or quantitative (aka, numeric).
Qualitative. Qualitative variables take on values that are names or labels. The color of a
ball (e.g., red, green, blue) or the breed of a dog (e.g., collie, shepherd, terrier) would be
examples of qualitative or categorical variables.
Quantitative. Quantitative variables are numeric. They represent a measurable quantity.
For example, when we speak of the population of a city, we are talking about the number
of people in the city - a measurable attribute of the city. Therefore, population would be a
quantitative variable.
In algebraic equations, quantitative variables are represented by symbols (e.g., x, y, or z).
Discrete vs. Continuous Variables
Quantitative variables can be further classified as discrete or continuous. If a variable can take
on any value between its minimum value and its maximum value, it is called a continuous
variable; otherwise, it is called a discrete variable.
Some examples will clarify the difference between discrete and continouous variables.
Suppose the fire department mandates that all fire fighters must weigh between 150 and
250 pounds. The weight of a fire fighter would be an example of a continuous variable;
since a fire fighter's weight could take on any value between 150 and 250 pounds.
Suppose we flip a coin and count the number of heads. The number of heads could be any
integer value between 0 and plus infinity. However, it could not be any number between 0
and plus infinity. We could not, for example, get 2.3 heads. Therefore, the number of
heads must be a discrete variable.
Univariate vs. Bivariate Data
Statistical data are often classified according to the number of variables being studied.
Univariate data. When we conduct a study that looks at only one variable, we say that
we are working with univariate data. Suppose, for example, that we conducted a survey
to estimate the average weight of high school students. Since we are only working with
one variable (weight), we would be working with univariate data.
Bivariate data. When we conduct a study that examines the relationship between two
variables, we are working with bivariate data. Suppose we conducted a study to see if
there were a relationship between the height and weight of high school students. Since we
are working with two variables (height and weight), we would be working with bivariate
data.
Test Your Understanding
Problem 1
Which of the following statements are true?
I. All variables can be classified as quantitative or categorical variables.
II. Categorical variables can be continuous variables.
III. Quantitative variables can be discrete variables.
(A) I only
(B) II only
(C) III only
(D) I and II
(E) I and III
Solution
The correct answer is (E). All variables can be classified as quantitative or categorical variables.
Discrete variables are indeed a category of quantitative variables. Categorical variables, however,
are not numeric. Therefore, they cannot be classified as continuous variables.
Populations and Samples
The study of statistics revolves around the study of data sets. This lesson describes two important
types of data sets - populations and samples. Along the way, we introduce simple random
sampling, the main method used in this tutorial to select samples.
Population vs Sample
The main difference between a population and sample has to do with how observations are
assigned to the data set.
A population includes all of the elements from a set of data.
A sample consists of one or more observations from the population.
Depending on the sampling method, a sample can have fewer observations than the population,
the same number of observations, or more observations. More than one sample can be derived
from the same population.
Other differences have to do with nomenclature, notation, and computations. For example,
A a measurable characteristic of a population, such as a mean or standard
deviation, is called a parameter; but a measurable characteristic of a
sample is called a statistic.
We will see in future lessons that the mean of a population is denoted by the
symbol ; but the mean of a sample is denoted by the symbol x.
We will also learn in future lessons that the formula for the standard deviation
of a population is different from the formula for the standard deviation of a
sample.
What is Simple Random Sampling?
A sampling method is a procedure for selecting sample elements from a population. Simple
random sampling refers to a sampling method that has the following properties.
The population consists of N objects.
The sample consists of n objects.
All possible samples of n objects are equally likely to occur.
An important benefit of simple random sampling is that it allows researchers to use statistical
methods to analyze sample results. For example, given a simple random sample, researchers can
use statistical methods to define a confidence interval around a sample mean. Statistical analysis
is not appropriate when non-random sampling methods are used.
There are many ways to obtain a simple random sample. One way would be the lottery method.
Each of the N population members is assigned a unique number. The numbers are placed in a
bowl and thoroughly mixed. Then, a blind-folded researcher selects n numbers. Population
members having the selected numbers are included in the sample.
Random Number Generator
In practice, the lottery method described above can be cumbersome, particularly with large
sample sizes. As an alternative, use Stat Trek's Random Number Generator. With the Random
Number Generator, you can select up to 1000 random numbers quickly and easily. This tool is
provided at no cost - free!! To access the Random Number Generator, simply click on the button
below. It can also be found under the Stat Tools tab, which appears in the header of every Stat
Trek web page.
Random Number Generator
Sampling With Replacement and Without Replacement
Suppose we use the lottery method described above to select a simple random sample. After we
pick a number from the bowl, we can put the number aside or we can put it back into the bowl. If
we put the number back in the bowl, it may be selected more than once; if we put it aside, it can
selected only one time.
When a population element can be selected more than one time, we are sampling with
replacement. When a population element can be selected only one time, we are sampling
without replacement.
Test Your Understanding
Problem 1
Which of the following statements are true?
I. The mean of a population is denoted by x.
II. Sample size is never bigger than population size.
III. The population mean is a statistic.
(A) I only.
(B) II only.
(C) III only.
(D) All of the above.
(E) None of the above.
Solution
The correct answer is (E), none of the above.
The mean of a population is denoted by ; not x. When sampling with replacement, sample size
can be greater than population size. And the population mean is a parameter; the sample mean is
a statistic.