CHAPTER 3 CATEGORICAL DATA
Create a frequency table
Example: number of students at CB South High School in each grade:
Grade Total
9 520
10 534
11 552
12 515
Categorical Distributions:
1. Bar Chart
Percent of Students Taking AP Number of Students Taking AP
Exam Exam
50.0% 80
40.0%
60
Frequency
Percent
30.0%
20.0% 40
10.0% 20
0.0%
0
9th 10th 11th 12th
9th 10th 11th 12th
2. Pie Chart
Percent of Students Taking AP
Exam by Grade Level
9th
6%
10th
12th 19%
43%
11th
32%
3. Contingency tables (aka 2-Way tables): Use the information from CB South High School
and the data provided in the chart to complete the Contingency table.
Frosh Soph Junior Senior Total
Male 270 281 1124
Female 257
Total
Identify:
• Row variable
• Column variable
• Values of the variable
• Total (n)
• # of Cells
• Totals
Example: Hospitals
Hospital A Hospital B Totals
Died 63 16
Survived 2037 784
Totals
• What percent of people died?
• Of those people that went to Hospital A, what percent died?
• Given that someone went to Hospital B, what is the chance that they died?
• Of those people who died, what percent went to Hospital A?
• What percent of people died and went to Hospital B?
• What percent of people survived or went to Hospital A?
2 types of Distributions for Categorical Variables
1) MARGINAL DISTRIBUTIONS
• Example: Hair color vs. Gender
Brown Blonde Black Red Total
MALE 26 24 10 3 63
FEMALE 20 35 12 6 73
TOTALs 46 59 22 9 136
- Find the marginal distribution for the HAIR COLOR variable
- Find the marginal distribution for the GENDER variable
• Bar Chart or Pie Graph
2) CONDITIONAL DISTRIBUTIONS
Example: Hair Color vs. Gender Brown Blonde Black Red Total
MALE 26 24 10 3 63
FEMALE 20 35 12 6 73
TOTALs 46 59 22 9 136
- Find the conditional Distribution for the HAIR COLOR variable
Brown Blonde Black Red Total
MALE
FEMALE
TOTALs
- Find the conditional Distribution for the GENDER variable
Brown Blonde Black Red Total
MALE
FEMALE
TOTALs
Represented visually: Segmented Bar Graph or
o Each bar = 100%
o Values of variable on the x-axis
o Bars are segmented into parts of each value
Represented Visually: Comparative Pie Charts
o Each Circle = 100%
o Pie pieces in the same order
o Circles are segmented in parts of each value
Independence: If two categorical variables are independent, then the value of one variable does not
change the probability distribution of the other.
Independent: Dependent:
Categorical Variable practice
A 4-year study reported in The New York Times, on men more than 70 years old analyzed blood
cholesterol and noted how many men with different cholesterol levels suffered nonfatal or fatal heart
attacks.
Low Medium High
cholesterol cholesterol cholesterol
Nonfatal
29 17 18
heart attacks
Fatal heart
19 20 9
attacks
a. Calculate the marginal distribution for cholesterol level and make a pie graph.
b. Calculate the marginal distribution for severity of heart attack and make a bar graph.
c. Calculate three conditional distributions for the three levels of cholesterol and make a stacked bar
graph.
d. Calculate the conditional distributions for the type of heart attack and make a stacked bar graph.
e. Are the two variables independent?