WELCOME TO STATISTICS -160
Section L40
Statistics is………
…a way to summarize and describe
information: not very interesting in itself
…an important tool for research in my field,
and something I look forward to learning more
about
… something that I should learn to earn my
degree
… boring
What best describes your attitude towards statistics?
Statistics is…….
How can we evaluate evidence against global
warming?
Are cell phones dangerous?
What are the chances of a tax return being
audited?
How likely are we to win the lottery?
Is there bias against women in appointing
managers?
Data
Data is information we
gather through
experiments and surveys.
1. Experiment on low carb
diet
Data: weight of subjects
before and after
2. Survey on effectiveness of
a TV ad
Data: percentage who
went to Starbucks since ad
aired [Link]
Statistics
Statistics is the art and science of
1. Designing studies,
2. Analyzing data that those studies produce.
The ultimate goal is to translate data into knowledge and
understanding.
Statistics is the art and science of learning from data.
Three aspects of a study
1. Design: Planning how
to obtain data
2. Description:
Summarizing the data
3. Inference: Making
decisions and
predictions [Link]
1st Aspect of a Study: Design
How do we conduct the
experiment or select people
for the survey to insure
trustworthy results?
Design Examples:
1. Planning data collection
to study effects of
Vitamin E on athletic
strength [Link]
2. For a marketing survey,
selecting people to
provide proper coverage
2nd Aspect of a Study: Description
Summarize raw data and
present in useful formats
(e.g., average, charts or
graphs)
Description Examples:
A graph showing total
precipitation in
Clarksville for each [Link]
month of 2005
Average age of students
in a statistics class is 25
years
3rd Aspect of a Study: Inference
Make decisions or predictions
based on the data
Inference Examples:
Relationship between
smoking cigarettes and
getting emphysema
47% of the registered
voters in Regina will vote
in the primary
Ladder of Inference
[Link]
Descriptive vs. Inferential Statistics
Descriptive statistics
summarize data –
graphs and numbers such
as averages and
percentages
Inferential statistics make
decisions or predictions
about a population
based on data obtained
[Link]
from a sample of that
population.
Variable
A variable is any
characteristic that
changes or varies over
time and /or for different
individuals or objects
under consideration.
Example: Height, Weight,
IQ, Hair Color
[Link]
Definitions Contd.,
Experimental unit: the individual or object on which
a variable is measured
Measurement: results when a variable is actually
measured on an experimental unit
A set of measurements, called data, can be either a
sample or a population
Definitions Contd.,
Population: the set of all measurements of interest
to the investigator
Sample: a subset of measurements selected from
the population of interest
Example:
Variable
Hair color
Experimental unit
Person
Typical measurements
Brown, black, blonde, etc.
How many variables have you measured
Univariate data: one variable is measured on a
single experimental unit
Bivariate data: two variables are measured on a
single experimental unit
Multivariate data: more than two variables are
measured on a single experimental unit
Types of Data
Qualitative Variable
Measure a quality or characteristic on each
experimental unit
Examples:
◼ Hair color (black, brown, blonde…)
◼ Make of car (Dodge, Honda, Ford…)
◼ Gender (male, female)
◼ Province of birth (Alberta, Ontario…)
Quantitative Variable
A variable is called quantitative if observations take
numerical values for different magnitudes of the
variable.
Examples:
1. Age
2. Number of siblings
3. Annual Income
Quantitative Variables
Discrete: if it can assume only a finite or countable
number of values
Continuous: if it can assume the infinitely many
values corresponding to the points on a line interval
Discrete Quantitative Variable
A quantitative variable
is discrete if its possible
values form a set of
separate numbers:
0,1,2,3,….
Examples:
1. Number of pets in
a household
2. Number of children
in a family
3. Number of foreign
languages spoken [Link]
by an individual
Continuous Quantitative Variable
A quantitative variable
is continuous if its
possible values form an
interval
Measurements
Examples:
1. Height/Weight
2. Age
3. Blood pressure
[Link]
Graphing Qualitative Variables
Use a data distribution to describe:
What values of the variable have been
measured
How often each value has occurred:
Frequency
Relative frequency = Frequency/n
(where n = sample size)
Percent = 100 x Relative frequency
Graphs for Categorical Data
Example: In a survey concerning public education, 400
school administrators were asked to rate the quality of
education in Canada
GRAPH TYPES: BAR CHART
24
PIE CHART
Angle = Relative Frequency × 360°
Dot plots
The simplest graph for quantitative data, dot plots
plot the measurements as points on a horizontal
axis, stacking the points that duplicate existing
points
Example: the set 2, 3, 6, 6, 7, 9
Interpreting Graphs: Location and Spread
Where is the data centred on the horizontal axis,
and how does it spread out from the centre?
Interpreting Graphs: Shapes
Mound shaped and symmetric
(mirror images)
Skewed right: a few unusually
large measurements
Skewed left: a few unusually
small measurements
Bimodal: two local peaks
Outlier
An outlier falls far from the rest of the data
Outliers
Describing data with numerical
measures
Graphical methods may not always be sufficient
for describing data
Numerical measures can be created for both
populations and samples
Measures of Centre
Measure of centre: a measure along the horizontal
axis of the data distribution that locates the centre
of the distribution
Median
Median: the middle measurement when the
measurements are ranked from smallest to
largest
The position of the median is
.5(n + 1)
once the measurements have been ordered
Example
The set: 2, 4, 9, 8, 6, 5, 3 n = 7
Sort: 2, 3, 4, 5, 6, 8, 9
Position: .5(n + 1) = .5(7 + 1) = 4th
Median = 4th largest measurement
• The set: 2, 4, 9, 8, 6, 5 n=6
• Sort: 2, 4, 5, 6, 8, 9
• Position: .5(n + 1) = .5(6 + 1) = 3.5th
Median = (5 + 6)/2 = 5.5 — average of the 3rd and 4th
measurements
Mode
Mode: the measurement which occurs most
frequently
In the set: 2, 4, 9, 8, 8, 5, 3
◼ The mode is 8, which occurs twice
In the set: 2, 2, 9, 8, 8, 5, 3
◼ There are two modes—8 and 2 (bimodal)
In the set: 2, 4, 9, 8, 5, 3
◼ There is no mode; each value is unique
Extreme Values
Symmetric: Mean = Median
Skewed right: Mean > Median
Skewed left: Mean < Median
Measures of Variability
Measure of variability: a measure along the
horizontal axis of the data distribution that
describes the spread of the distribution from
the centre
The Range
Range (R): the difference between the largest
and smallest measurements in a set
Example: a botanist records the number of petals on
five flowers: 5, 12, 6, 8, 14
The range is R = 14 – 5 = 9
It is quick and easy, but only uses 2 of the 5
measurements
The Variance
Variance: a measure of variability that uses all
the measurements; it measures the average
deviation of the measurements about their
mean
Example: a botanist records the number of petals
on five flowers: 5, 12, 6, 8, 14
45
x= =9
5
4 6 8 10 12 14