Stats: Data and Models
Fourth Canadian Edition
Chapter 1
Stats Starts Here
Copyright © 2019 Pearson Canada Inc. 1-1
Important Knowledge Points
• Five W’s of data, especially “who” and “what”.
• Categorical variable vs. quantitative variable.
• Install R software and load data into R
(PowerPoint slides and video).
Copyright © 2019 Pearson Canada Inc. 1-2
What is Statistics?
• Statistics is a way of reasoning, along with a
collection of tools and methods, designed to
understand the information gathered (data) on the
specific problem / objective of interest.
• Statistics helps to understand and model the
variation in gathered data so that we can
understand the truths and patterns.
Copyright © 2019 Pearson Canada Inc. 1-3
Examples of Data
• Election polls
• Surveys
• Experiment data
• Data are useless without their context…
Copyright © 2019 Pearson Canada Inc. 1-4
Data context
• To provide context we need 5 W’s and 1 H:
– Who (population and observations in the data)
– What (characteristics or variables)
– When
– Where
– Why the data were collected (purpose of the study)
– How the data were collected
• Note: the answers to “who” and “what” are
essential to have useful data / information.
Copyright © 2019 Pearson Canada Inc. 1-5
Who and What
• The “Who” of data.
Individuals/units whose data were collected. We refer to
“who” as observations.
• The “What” of data.
Characteristics recorded about each observation. We
refer to “what” as variables.
• Raw data table
– Rows represent observations.
– Columns represent variables.
Copyright © 2019 Pearson Canada Inc. 1-6
Variables
• The characteristics recorded about each individual are
called variables.
• A categorical (or qualitative) variable names categories
and answers questions about how cases fall into those
categories.
– Categorical examples: sex, race, ethnicity
• A quantitative variable is a measured variable on a
continuous scale that answers questions about the
quantity of what is being measured.
– Quantitative examples: income ($), height (centimetres), weight
(kilograms), blood pressure (millimetres of mercury - mmHg).
Copyright © 2019 Pearson Canada Inc. 1-7
Example (Table 9)
Table 9 presents data on 78 seventh-grade students in a
rural Midwestern school. The researcher was interested in
the relationship between the students’ “self-concept” and
their academic performance. The data we give here include
each student’s grade point average (GPA), score on a
standard IQ test, and gender, taken from school record.
Gender is coded as 1 for female and 2 for male. The
students are identified only by an observation number
(OBS). The missing OBS numbers show that some students
dropped out of the study. The final variable is each student’s
score on the Piers-Harris Children’s Self-Concept Scale, a
psychological test administered by the researcher.
Copyright © 2019 Pearson Canada Inc. 1-8
Copyright © 2019 Pearson Canada Inc. 1-9
Example continued
Who: 78 seventh-grade students
What: GPA, IQ, gender, self-concept score
When: not specified;
Where: a rural Midwestern school in the US
Why: study the relationship between the students’
self-concept and their academic performance
How: not specified
Copyright © 2019 Pearson Canada Inc. 1 - 10
Identifiers
• Identifier variables are categorical variables with
exactly one individual in each category.
– Examples: Social Insurance Number, FedEx Tracking
Number
– Each data have a unique ID per observation, e.g. OBS
in Table 1_9.txt data.
• Don’t be tempted to analyze identifier variables.
• Be careful not to consider all variables with one
case per category, like year, as identifier variables.
Copyright © 2019 Pearson Canada Inc. 1 - 11
Chapter 1 Review
• How to organize your study (posted in Course set up
section)
• Install R in your computer
• Work on chapter 1 R commands and output.
• Chapter 1 Practice Problems.
Copyright © 2019 Pearson Canada Inc. 1 - 12