1/31/2023
Statistics 1A
STA01A1
What is Statistics?
We are constantly exposed to collections of facts, or
data, both in our professional capacities and in
everyday activities.
➢ Employment statistics
➢ Income and expenditure
➢ Accident statistics
➢ Population statistics
➢ Birth and death
➢ Exports and imports, etc.
The discipline of statistics provides methods for
organizing and summarizing data and for drawing
conclusions based on information contained in the
data.
Definitions
❑ A population is a set of all measurements of interest
❑ A sample is a subset of a population that is obtained
through some process (sampling), for the purposes of
investigating the properties of the underlying population.
Parameter
Population
Statistic
Sample
x
3
1
1/31/2023
Definitions
❑ A variable is any attribute or characteristic (such as age,
weight, length, etc.) of an object being investigated and that is
considered to be capable of varying from one object to another
in the population
❑ The object or individual on which a variable is measured, is
an experimental unit (case)
❑ Data is the Latin word for "those that are given", and can
be thought of as the "results/outcomes/actual values of
observations“
Descriptive vs Inferential
STATISTICS
Descriptive Inferential
Collection Drawing conclusions
Organizing
Presentation
Analysis
Probability And Statistical Inference
Probability
Population
Sample
Statistical
Inference
2
1/31/2023
Collecting Data
Statistics deals not only with the
organization and analysis of data once it
has been collected but also with the
development of techniques for collecting
the data.
If data is not properly collected, an
investigator may not be able to answer
the questions under consideration with a
reasonable degree of confidence.
Sampling
Probability Sampling Methods
Selection methods where the sample members are
selected from the target population on a purely
random (chance) basis
Simple random sampling
Systematic random sampling
Stratified random sampling
Cluster random sampling
3
1/31/2023
Simple Random Sampling
• The sampling plan or experimental
design determines the amount of
information you can extract, and often
allows you to measure the reliability of
your inference
• Simple random sampling is a method
of sampling that allows each possible
sample of size n an equal probability of
being selected
10
10
Example
•There are 29 students in a
statistics class. The lecturer wants
to choose 5 students to form a
group. How should she proceed?
1. Give each student a number from
01 to 29
2. Choose 5 pairs of random digits
from the random number table
3. If a number larger 29 is chosen,
simply select another number
4. The five students with those
numbers form the group
11
11
Stratified Random Sampling
• Population heterogeneous
• Divide population into homogenous strata
• Take simple random samples from each
strata in proportion to relative size of
each stratum
Population
12
4
1/31/2023
Cluster Random Sampling
• Large and geographically dispersed population with
natural clusters, similar in profile to each other
• Take simple random sample of clusters and then
random samples of the sampling unit within each
cluster
Population
13
Systematic Random Sampling
• Uses a sampling frame (address list or database)
• First sampling unit is selected randomly from
sampling frame
• Subsequent sampling units selected at uniform
intervals
N
n 2
15000 Population 62
=
500 (15 000)
= 30
32
14
Types of Data
In determining the most appropriate ways to
summarize or analyse data, it is useful to
classify variables as either categorical or
quantitative:
• A categorical variable divides the cases into
groups, placing each case into exactly one of
two or more categories
• A quantitative variable measures or records a
numerical quantity for each case.
15
15
5
1/31/2023
Graphs for categorical variables
16
16
Quantitative data
Some quantitative data is obtained by counting to
determine the value of a variable
• The number of traffic offences a person received
during the last year
• The number of vehicles arriving at a tollgate during a
particular period
• The number of students in this class
whereas other data is obtained by taking measurements
• Weight of an individual
• Reaction time to a particular stimulus
• Time taken to complete a test
17
17
Histograms for “Counting” variables
18
18
6
1/31/2023
Example
❑ A bag of M&Ms contains 25 candies:
❑ Raw Data: m m m m m m m m m m
m m m m m m m m m m
m m m m m
❑ Frequency distribution:
Color Tally Frequency Relative Percent
Frequency
Red mmm 3 3/25 = 0.12 12%
Blue mmmmmm 6 6/25 = 0.24 24%
Green mm mm 4 4/25 = 0.16 16%
Orange mmmmm 5 5/25 = 0.20 20%
Brown mm m 3 3/25 = 0.12 12%
Yellow mmmm 4 4/25 = 0.16 16%
19
Histograms for “Measurement” variables
Constructing a histogram for measurement data entails
subdividing the measurement axis into a suitable number of
class intervals or classes, such that each observation is
contained in exactly one class.
❑ Divide the range of the data into 5-20 subintervals of equal
length
❑ Calculate the approximate width of the subinterval as
range/number of subintervals
❑ Round the approximate width UP to a convenient value
❑ Use the method of left inclusion, including the left endpoint,
but not the right in your tally
❑ Create a statistical table including the subintervals, their
frequencies and relative frequencies
20
20
Example
The ages of 50 lecturers at UJ:
34 48 70 63 52 52 35 50 37 43 53 43 52 44
42 31 36 48 43 26 58 62 49 34 48 53 39 45
34 59 34 66 40 59 36 41 35 36 62 34 38 28
43 50 30 43 32 44 58 53
Frequency Distribution:
Step 1: Determine the range
Range = max− min
The range = 70 – 26 = 44
Step 2: Determine the number of classes
Between 5 and 12, depending on sample size
We choose to use 6 intervals
21
7
1/31/2023
Example
Step 3: Determine the class width
Range
Class width =
Number of classes
Minimum class width = 44/6 = 7.33
Convenient class width = 10
Step 4: Determine the lower limit of the 1st class
Lower limit of 1st class min
Start at 20
22
Example
Step 5: Determine the class limits
Upper limitof 1 class = lower limit+ class width
st
Lower limitof 2 class = upper limitof 1 class
nd st
Upper limitof 2 class = lower limit+ class width
nd
[20 ; 30)
[30 ; 40) etc.
Step 6: Determine the frequency
A count of the number of values in each class
23
23
Age Tally Frequency Relative Percent
Frequency
[20 ; 30) 11 2 2/50 = .04 4%
[30 ; 40) 1111 1111 1111 1 16 16/50 = .32 32%
[40 ; 50) 1111 1111 1111 15 15/50 = .30 30%
[50 ; 60) 1111 1111 11 12 12/50 = .24 24%
[60 ; 70) 1111 4 4/50 = .08 8%
[70 ; 80) 1 1 1/50 = .02 2%
Histogram
18
16
14
12
Frequency
10
8
6
4
2
0
29 39 49 59 69 79 More
30 40 50 60 70 80
Bin
24
24
8
1/31/2023
Histogram Shapes
Histograms come in a variety of shapes: Bimodal
Symmetric
Skewed to
the left
Skewed to
the right
25
Describing the Distribution
Histogram
18
16
14
Shape? Skewed right
12
Frequency
10
8
Outliers? No 6
4
2
0
What proportion of the
29 39 49 59 69 79 More
30 40 50 60 70 80
Bin
lecturers are younger than (2 + 16)/50 = 18/50 = 0.36
40?
What is the probability that (12 + 4 + 1)/50 = 17/50 =
a randomly selected lecturer 0.34
is 50 or older?
26
27