Grade 11-12 Statistics Winter Program
Grade 11-12 Statistics Winter Program
GRADE 11 & 12
TOPIC : STATISTICS
LESSON TOPIC:
STATISTICS
DATE: 02 – 05 JULY 23
Putting Lessons Into Perspective (Grade 11)
Focus: Statistics
Sunday : 02/07/2023
Wednesday: 05/07/2023 Lesson Topic:
Lesson Topic: Measure of central tendency and
Consolidation: Statistics Dispersion.
Monday: 03/07/2023
Lesson Topic:
Tuesday: 04/07/2023
Box and Whisker, Outliers,
Lesson Topic:
Standard deviation
Histogram and Polygon
Sketching of Ogives
3
Teacher Section
Teacher Section!!!!
4
EVERYDAY, BE A TEACHER TO YOUR LEARNERS
“THE DAY YOU ARE WILLING TO VEER OFF THE LESSON
PLAN, FOLLOW A LEARNER’S LEAD, AND LEARN WITH
YOUR STUDENTS IS THE DAY YOU REALLY BECOME
A TEACHER.”
LESSON PLAN.” 2
CAPS DOCUMENT : GRADE 10 – 12
6
CAPS DOCUMENT : GRADE 11
7
Grade 12 Exam Guidelines
8
Ungrouped Data: Learning goals
5
PRIOR KNOWLEDGE TO BE RECAPPED
• REVISE GRADE 10
2. GROUPED DATA
CONCEPTS:
• STEM AND LEAF
1. UNGROUPED DATA
• HISTOGRAM
• REPRESENTING
• FREQUENCE POLYGON
• BAR (TALLYFREQUENCY TABLE)
• MEASURES OF CENTRAL
• PIE CHART
TENDANCY
• BROKEN LINE
• ESTIMATED MEAN
• MEASURES OF CENTRAL
• MODAL CLASS
TENDANCY
• MEDIAN
• MEASURES OF DISPERSION
• DISTRIBUTION OF DATA
• FIVE NUMBER SUMMARY
Suggested Methodology/Approach for Statistics
12
MATHEMATICS GRADE 11
PAPER 2
LESSON 1:
Statistics:
Ungrouped Data: Measures of
Central Tendency &
Dispersion
2 July 2023
Stats!!!
Ungrouped
Data
What is Statistics
DISCRETE DATA
• Data that can only take certain values. For example, the number of
learners in a class (there can’t be half a learner)
CONTINUOUS DATA
• Data that can take on any value within a certain range. For example,
the heights of a group of learners (heights could be measured in
decimals)
1. TERMINOLOGY
Population: Collection of all potential observations
that can be found in a givensituation.
Wi l l c o n s i d e r t h e f o l l o w i n g t h r e e
M e a s u r e s o f C e n t r a l Te n d e n c y :
• Mean: Average of observations
• M e d i a n : M i d d l e Va l u e
• Mode: Most frequently
occuring observation.
MEASURES OF CENTRAL TENDENCIES OF DATA IN A
FREQUENCY TABLE
How to organise
Ungrouped or Raw Data
• Not arranged in any meaningful fashion
• Ungrouped Data or Raw Data
Example : The number of SMS calls received (variable x) in a certain
day by12 students may be recorded as: 0;3;6;5;2;5;4;8;3;5;5 and 7.
For further analysis by hand or PC the set of raw data is usually arranged
in an ascending order.
Discuss :
Mode Organised in an ascendingorder:
Mean &
Median
0 2 3 3 4 5 5 5 5 6 7 8
Determining the Median for Ungrouped Data
• Median (Q 2 ) is the middle value in the data set.
n +1 th
• Location: position, provided data is ordered.
2
n odd: 7, 13, 14, 17, 20
Location of Q 2 = 5 +1
th
= 3 rd
position
2
Q 2 =14 n even: 7, 13, 14, 17, 20, 21
• Location of Q 2 ? 6 +1 = 3,5th position
2
• Calculate Q 2 = 14 +17 = 15,5
2
Calculating the Mode for Ungrouped Data
F o r u n g r o u p e d data:
M o can b e f o u n d b y a n i n s p e c t i o n
of the observations.
Consider the ordered ungrouped data
3; 5; 12; 12 and 13.
There can be more
Mode: M o = 12 than one mode.
5 6 6 6 7 9 9 9 10
SKEWNESS USING
x x x
x = Q 2 = Mo Mo Q 2 x x Q 2 Mo
A few very large values More very largevalues
Tail to develop onright Tail to develop onleft
x and Q 2 dragged to right x and Q 2 dragged to left
Mo Q 2 x x Q 2 Mo
Reliability of Measures of Central Tendency
29
Assessment Activities
30
SOLUTION
31
Conclusion : Summary of Key Points
32
Concluding Remarks
33
MATHEMATICS GRADE 11
PAPER 2
LESSON 2:
Q1 Q2 Q3
Largest
Lowest
class limit Whiskers class limit
Min Max
Box - and - Whisker Plot
Clarify the
Percentiles!!!
CLARIFY WITH LEARNERS
Median = Q2 = P50 = 21
Lower Quartile = Q1 = P25 =17
Upper Quartile = Q3 = P75 = 26
Min =10 Max = 32
5 - Number Summary : (10; 17; 21; 26; 32)
Interpret : Box - and - Whisker Plot
• Clarify the relationship between the Mean and the Mode on the distribution
of data and clarify the skewness:
✓ Note that if the mean and the median of a data set are known, then
Min = 9 Q1 = P 25 =
23 + 33 = 28 P50= Q2 =55 Q3 = P 75 =
75 + 75 = 75 Max = 92
2 2
Five-Number Summary
(Min, Q1, Q 2 , Q3, Max) = (9, 28, 55, 75, 92)
ACTIVITY
Five-Number Summary for Class A
( Min, Q1 , Q2 , Q3 , Max ) = (9, 28, 55, 75, 92)
2. Draw the box and whisker diagram that represents class A's marks.
3. D e t e r m i n e w h i c h c l a s s p e r f o r m e d b e t t e r in t h e J u n e
e x a mi n a t i o n a n d g i v e r e a s o n s for your con clus i on.
Median
Mean
Mean
Mean
50
STANDARD DEVIATION AND VARIANCE
• We are often required to find out how many items are within 1, 2 or 3 standard
deviations from the mean:
• 1 standard deviation from the mean is denoted as follows: (𝑥 − x ; 𝑥+ x)
• 2 standard deviations from the mean is denoted as follows:(𝑥 − 2 x ; 𝑥+ 2 x)
• 3 standard deviations from the mean is denoted as follows:(𝑥 − 3 x ; 𝑥+ 3 x)
Make learners
aware: These are
not in the formular
sheet
STANDARD DEVIATION AND VARIANCE CONT…
Standard Deviation - around the mean: - Below, within above, outside and
related percentages, etc.).
• How many are : Below, within, above, outside.
• Percentage of those that are: Below, within, above, outside
Discuss with
the learners
STANDARD DEVIATION AND VARIANCE (WITHOUTA CALCULATOR)
SOLUTION
……SOLUTION…..
……SOLUTION…..
Soluti
on
IDENTIFYING OUTLIERS
In a set of data, it sometimes happens that a particular number is extremely high or low in
comparison to the other numbers. Such a number is called an outlier.
In 1977, the statistician, John Tukey, invented box and whisker plots and defined an outlier
to be any number in a data set which falls outside the interval:
where Q1 and Q3 represent the lower and upper quartiles respectively. IQR represents the
inter-quartile range ,
You will notice that the mean for the data including the outliers is different to the mean
when the outliers are excluded. The standard deviations in each case are also different.
The value of the median is only slightly affected by the exclusion of the outliers.
IDENTIFYING OUTLIERS - ACTIVITY
Assessment Activities
68
ACTIVITIES
SOLUTIONS
ACTIVITIES
SOLUTIONS
ACTIVITIES
SOLUTIONS
ACTIVITIES
SOLUTIONS
Conclusion : Summary of Key Points
77
Concluding Remarks
78
MATHEMATICS GRADE 11
PAPER 2
LESSON 3:
-10 < t ≤ 0 0 0
0 < t ≤ 10 4 0+4=4
10 < t ≤ 20 12 4+12=16
20 < t ≤ 30 28 16+28=44
30 < t ≤ 40 32 44+32=76
40 < t ≤ 50 29 76+29=105
50 < t ≤ 60 15 105+15=120
Example continue…
NB: Always remember when drawing cumulative frequency curve from a table of
grouped data, the cumulative frequencies are plotted at the upper limit of the interval.
Using the ogive to get Q1; Q2 and Q3
iii) To find the approximate value of the upper quartile (Q3), find the midpoint of the
upper half of the values plotted on the cumulative frequency axis.
• There are 60 terms in the upper half of the data, so the upper quartile lies between
60 + 30 = 90th and the 91st term.
• Draw a horizontal line from just above 90 until it touches the ogive.
• From that point draw a vertical line down to the horizontal axis.
• So the upper quartile ≈ 45 minutes.
b)
i) The median tells us that 50% of the learners took 35 minutes or less or to walk to
school.
ii) The lower quartile tells us that 25% of the learners took 25 minutes or less to walk to
school.
iii) The upper quartile tells us that 75% of the learners took 45 minutes or less to walk to
school.
OGIVE
Determining cumulative frequencies is an effective way of representing
grouped data. If you want to find the median of grouped data from a
frequency table, a useful way to do this is by first determining the
cumulative frequencies from the frequency table and then representing
the information on a cumulative frequency graph (or ogive curve).
EXAMPLE
The company HEALTHMANIA conducted a survey in Gauteng to find out
which age group most frequently uses their health supplements. The
company determined the ages of a representative sample of their
current client group. The ages of current clients were recorded and then
sorted. The company wanted to market a new health supplement to the
age group in Gauteng which most frequently uses their products.
OGIVE CONT…
OGIVE CONT…
OGIVE CONT…
NOTICE:
The total frequency of marks (77) is equal to the final cumulative frequency (77).
We can use the graph to determine estimates of the quartiles and percentilesfor this data.
1 1
Position of median = (n+ 1) = (77 +1) = 39th position
2 2
1 1
Position of lower quartile = (n+ 1) = (77+1) = 19,5th position
4 4
3 3
Position of upper quartile = (n+ 1) = (77+1) = 58,5th position
4 4
95
ACTIVITY
ACTIVITY (continued)
Diagram sheet 2
SOLUTIONS
SOLUTIONS (CONTINUED)
SOLUTIONS (CONTINUED)
ACTIVITY
ACTIVITY (Continued)
Diagram sheet 1
SOLUTIONS
SOLUTIONS (continued)
ACTIVITY
ACTIVITY (continued)
SOLUTIONS
ACTIVITY
OGIVE CONT… - ACTIVITY
OGIVE CONT… - SOLUTIONS
OGIVE CONT… - SOLUTIONS
ACTIVITY
ACTIVITY
WORK AREA
SOLUTIONS
EXAM QUESTIONS – SOLUTIONS
EXAM QUESTIONS – SOLUTIONS
ACTIVITY
EXAM QUESTIONS – AS ACTIVITIES
EXAM QUESTIONS – SOLUTIONS
EXAM QUESTIONS – SOLUTIONS
OGIVE CONT… - ACTIVITY
OGIVE CONT… - ACTIVITY
OGIVE CONT… - ACTIVITY
OGIVE CONT… - ACTIVITY
OGIVE CONT… - ACTIVITY
Conclusion : Summary of Key Points
129
Concluding Remarks
130
Concluding Remarks
Following our today lesson, I Repeat this procedure until you are
want you to do the to: confident.
Attempt as many as possible other similar examples on your own from the
Text-Book and the past exam papers.
131
MATHEMATICS GRADE 11
PAPER 2
LESSON 4:
Statistics: Consolidation
5 July 2023
Assessment Activities : Next Level
133
Assessment Activities
134
SOLUTIONS
135
Assessment Activities
136
SOLUTIONS
137
Assessment Activities
138
Assessment Activities
139
SOLUTIONS
140
Assessment Activities
141
SOLUTIONS
142
SOLUTIONS
143
SOLUTIONS
144
Assessment Activities
145
Assessment Activities
146
Assessment Activities
147
SOLUTIONS
148
SOLUTIONS
149
SOLUTIONS
150
SOLUTIONS
151
SOLUTIONS
152
Assessment Activities
153
Assessment Activities
154
SOLUTIONS
155
Assessment Activities
156
Assessment Activities
157
SOLUTIONS
158
Conclusion : Summary of Key Points
159
Concluding Remarks
Following our today lesson, I Repeat this procedure until you are
want you to do the to: confident.
Attempt as many as possible other similar examples on your own from the
Text-Book and the past exam papers.
160
Thank you
Grade 12
Bivariate Data
Scatter plot & regression line
163
Example of a Scatter Plot
164
Types of Correlation(r)
165
Scatter Plot
166
Scatter Plot: Strong Positive Correlation (r)
167
Scatter Plot: Moderate Positive Correlation (r)
168
Scatter Plot: Negative Correlation (r)
169
Scatter Plot: Moderate Negative Correlation (r)
170
Scatter Plot: Relationship
171
Scatter Plot & Line of Best Fit
172
Scatter Plot & Regression line
173
Scatter Plot & Regression line
174
Scatter plot & regression line
175
Correlation between two variables
•Interpolation: • Extrapolation :
Estimate values of y outside
Estimate values of y for values
the range of observed values.
of x within observedrange. Predictions
Examples of Bivariate Data
Consider the following three sets of collected bivariate data:
Example 1 :
Key Questions :
• Is there a relationship between x and y ? Can y be x −4 −2 −1 1 4
expressed as a relation in x ? y −11 −7 −5 −1 5
• Can we determine the defining equation for such a relation? Can we
extend the observed data?
x −4 −2 −1 1 4
Observed Data:
y −11 −7 −5 −1 5
• Select STAT LINEAR MODE on your calculator.
• Input OBSERVED DATA from table into calculator.
• RECALL from Calculator values of A (y − intercept) and B (gradient).
• Write down defining equation in the form y = Bx + A.
Defining Equation: y = 2 x − 3
yˆ(−3) = −9
• Utilize this defining equation to extend data table.
• OR utilize Calculator Capacity to extend table.
yˆ(0) = −3
x −4 −3 −2 −1 0 1 2 3 4 yˆ(2) = 1
y −11 −9 −7 −5 −3 −1 1 3 5 yˆ(3) = 3
Strength of Linear Relationship
y •• y
• • • •• •
•• • •
•• • • • ••
• •• •
•
• •• • •
• • • •• •
x x
Strong, +ve Linear Weak Linear, - ve
Correlation Correlation
1. −1 r 1
2. r = 1 A perfect positive linear relationship
(All points exactly on line with a positive gradient)
3. r = −1 A perfect negative linear relationship.
(All points exactly on line with a negative gradient)
4. r = 0 No linear relationship.
r Interpretation
1 Perfect positive association
0,9 Strong positive association
0,5 Moderate positive association
0,2 Weak positive association
0 No association
−0, 2 Weak negative association
−0,5 Moderate negative association
−0,9 Strong negative association
−1 Perfect negative association
N.B: The description of r is not provided in the formula sheet (remind learners). 185
Coefficient of Determination.
Important question: How well does the straight line
represent the relationship?
Regression line can be seen as a Regression line is not a good
good fit for observed data fit for observed data
y •
y •
• •
• •• • •
• •• ••• •
•• • •
• • 2 r 100% • •
••
x x
Coefficient of determination = r −squared = r 2
What percentage of variation in y is explained by variations in x ?
Activity
187
Example : Eight randomly selected families were asked about their monthly
incomes and amounts saved in that month. The following data was
recorded:
Amount saved 500 1 900 1 100 2 500 1 500 2 300 800 2 100
MonthlyIncome 10 000 19 500 13 400 27 000 16 000 22 000 12 000 20 000
General Trend
• Linear relationship
• Positive gradient
• Savings increases with increase in income
Correlation Coefficient
Monthly Income: x 10 000 19 500 13 400 27 000 16 000 22 000 12 000 20 000
Amount saved: y 500 1 900 1 100 2 500 1 500 2 300 800 2 100
Procedure :
• Select STATS LINEAR MODE
• Input available data
• RECALL value of r
•
•
•
Activity 1
A training manager wants to know if there is a link between the hours in training (x)
spent by a particular category of employee and their productivity (units produced per day, y )
on a job. The data below was extracted from the files of 10 employees.
Employee 1 2 3 4 5 6 7 8 9 10
Hours in training ( x) 16 36 20 38 40 30 35 22 40 24
Units produced per day ( y ) 45 70 44 56 60 48 75 60 63 38
Employee 1 2 3 4 5 6 7 8 9 10
Hours in training ( x) 16 36 20 38 40 30 35 22 40 24
Units produced per day ( y ) 45 70 44 56 60 48 75 60 63 38
Use Calculator:
Select STAT Mode
Select line option: Y = A + BX
Input data
Recall A = 29.22 and B = 0.89
Defining equation: y = 0.89x + 29.22
Equation Least Square Line: Using Formulae
Employee 1 2 3 4 5 6 7 8 9 10
Hours in training ( x) 16 36 20 38 40 30 35 22 40 24
Units produced per day ( y ) 45 70 44 56 60 48 75 60 63 38
Use relevant formulae:
yˆ= a + b x where
a = y − b x and
Equation of least squares line:
y = 0.89x + 29.22 (Same result)
Due to mark allocation the calculator option is prefered.
Sketch Least Square Line
( )
Always one possibility: x; y = (30.1;55.9)
Determine at least 3 points on line: yˆ(20) = 46.9 (20;46.9) is a second possible point
yˆ(40) = 64.7 (40;64.7 ) is a third possible point
Activity 2
The relationship between age and spending money on buying clothes per month has
been studied for years. Research has shown the following results:
197
……….Activity 2
2. Determine the equation of the regression line and draw it on the scatter plot.
3. Describe the trend of the data with reference to the correlation coefficient.
4. Estimate the expenditure of a 50-year old person from the scatter plot.
198
Solution
1.
Age
199
Solution
2.
3. r = −0,995
Strong negative correlation.
4.
1. Approx R600
5. Interpolation.
Data value is within the given range.
200
Activity 2
(1) Estimate the productivity level for a particular employee
who has received only 22 hours of training.
(2) Determine the correlation between productivity and hours of training.
(3) Is the association strong? Advise the manager.
…..Activity 3
(1) Take a reading from graph or
Determine yˆ(22) = 48.72 using calculator or
Calculate: yˆ(22) = 0.89 22 + 29.22 = 48.8
(2) Using calculator: r = 0.66
Age ( x) 59 32 42 50 22 39 21 20 27 40 29 47
Resting heart rate ( y ) 88 74 74 93 85 71 78 82 70 75 95 75
Age ( x) 59 32 42 50 22 39 21 20 27 40 29 47
Resting heart rate ( y ) 88 74 74 93 85 71 78 82 70 75 95 75
y = 0.0954x +76.5956
…..Solution
(4) Calculate the correlation coefficient for the data.
(5) Use the correlation coefficient to comment on the relastionship
between age and the resting heart rate.
(6) If a learner uses the least square line to predict the resting heart rate
of a 45-year-old person, will his answer be reliable? Motivate your answer.
(1) Draw a scatter plot of the data given on a grid: Left as an exercise.
(2 ) Calculate the equation of the least square line for this data: y = 0.81x + 25.23
(3) Calculate the correlation coefficient: r = 0.898
(4 ) Comment on the correlation of the data.
Strong positive correlation
(5 ) If Joan's heart rate after jogging is 86 beats per minute,
what is her resting heart rate, in beats per minute?
xˆ(86) = 74.6 beats per minute
Activity 6
The outdoor temperature, in C, at noon on 10 days and the
number of units of electricity, in kW, used to heat a house on
each of those days, are shown in the table below.
Noon temperature: T
7 11 9 2 4 7 0 10 5 3
(in C)
Units of electricity used: E
32 20 27 37 32 28 41 23 33 36
(in kW)
(1) Draw a scatter graph that shows this information on a grid.
(2 ) Determine the equation of the least squares regression line.
(3) Determine the correlation coefficient.
(4 ) What can we conclude about the relationship between the noon
temperature and the number of units of electricity used for heating?
(5 ) Estimate the number of units of electricity that was used to heat the
house on a day when the outdoor temperatuer at noon was 8C.
Solution
Use Calculator to show that:
Least square regression line is defined by
y = −1.73639x +40.971088
Correlation Coefficient: r = −0.969926
Show that: yˆ(8) = 27.07993
(1) Draw a scatter graph that shows this information on a grid: Left as exercise
(2 ) Determine the equation of the least squares regression line: y = −1.74x + 40.97
(3) Determine the correlation coefficient: r = −0.969926
(4 ) What can we conclude about the relationship between the noon
temperature and the number of units of electricity used for heating?
Strong negative correllation (r tends to −1)
If noon temperatures increases the elctricity usuage decreases
(5 ) Estimate the number of units of electricity that was used to heat the
house on a day when the outdoor temperatuer at noon was 8C: yˆ(8) = 27.07993
Activity 7
The scatter plot below represents the times taken by the winners of the men's
100 m freestyle swimming event at the Olympic Games from 1972 to 2004.
The data was obtained from www.databaseOlympics.com.
1. Calculate the equation of the least square line for this data and draw it.
2. Describe the trend that is observed in these times.
3. Give ONE reason for this trend.
4. What can be said about the efforts of the winners in the years 1976 and 1988?
5. Use your line of best fit to predict the winning time for 2008.
Line of Best fit
1. Calculate the equation of the least square line for this data and draw it.
y = −0,0904x + 229
Best Fit Line
Draw a line of best fit for the data on the graph.
•
y = −0,0904x + 229
•
Calculator : Predicted Values
yˆ(1972) = 50,79
•
yˆ(1984) = 49,71
yˆ(1996) = 48,62
Trends
2. Describe the trend that is observed in these times.
3. Give ONE reason for the trend.
Time taken Men's 100m freestyle
3. Negative gradient
Downward trend
Times decreased
Swimming faster
y = −0,0904x + 229 Improved performance
4. Better exercise methods
Controlled diets
Swimwear: Less friction
More professional approach
Interpretation
4. What can be said about the efforts of the winners in the years 1976 and 1988?
Year 1972 1976 1980 1984 1988 1992 1996 2000 2004
Time 51,2 50,0 50,4 49,8 48,6 49,0 48,7 48,3 48,1
Prediction
5. Use your line of best fit to predict the winning time for 2008.
218
Thank you