INTRODUCTION TO STATISTICS
Science of conducting studies to collect,
organize, summarize, analyze, and draw
conclusions from data.
Statistics represents scientific procedures and
methods for collecting , organizing and
analyzing, interpreting and presenting data as
useful information, to draw valid conclusions
and make effective decisions.
Data – the values
A collection of
(measurements or
data values
observations) that
forms a data
the variables can
set
assume
DATA
Student Gender Age CGPA Study Program
Hours
1 F 19 3.3 3 CS111
2 M 20 3.45 3.5 BA111
250
• statistics helps us turn data into information; that is,
data that have been interpreted, understood and are
useful to the recipient
Sports (you can watch Moneyball)
Public health
Quality control
Estimation
Prediction
Marketing
Accounting
Financial planning
Advertising
Statistical techniques - Important tools for all managers and
decision-makers
TERMS DEFINITION
Population A population consists of all subjects(human or otherwise)that are being
studied
Sample A sample is a group of subjects selected from a population
Parameter A summary measure such as mean, median, mode or standard
deviation, computed from entire population
Statistic A summary measure from sample data
Variable Characteristic or attribute that can assume different values
Census collection of data from all subjects in the population(a study that is
carried out on the whole population)
Sample survey A study that involves a subgroup(or sample) of population
Pilot study A small scale study done before the actual study is carried out
Population
sample
INFERENTIAL STATISTICS
• Consists of generalizing from samples to populations, performing estimations and
hypothesis tests, determining relationships among variables and make predictions.
• Used to draw conclusions or inferences about characteristics of population based
on data from a sample.
• Make inferences from samples to populations
• Example : If we want to test Bayer Aspirin is better than Tylonol at relieving pain,
we could not give these drugs to everyone in the population. It is not practical since
the general population is so large. Instead we might give it to a couple of hundred
people and see which one works better for them. With inferential statistics, we can
infer that what was true for a few hundred people is also true for a very large
population of hundreds of thousand of people.
DESCRIPTIVE STATISTICS
• Consists of the collection, organization, summarization and presentation of data
by using numerical, graphical or tabular methods
• Presenting the data in meaningful form
• Example : presenting the data of national census conducted by Malaysian
government every 10 years
• Specific information collected by the person
PRIMARY who is doing the research(researcher)
• Collect the data through surveys, interviews,
DATA direct observations and experiments.
• Data collected by others
SECONDARY • Any material that has been collected from
published records such as newspapers,
DATA journals, research papers and so on.
Sources of data Advantages Disadvantages
Primary data Collect more Consumes a lot
accurate of time, effort,
information and cost
Secondary data Requires less Accuracy of
time, effort and the secondary
cost data can be
questionable
variables
• Quantitative variables
are variables that can
be counted or
measured
qualitative quantitative • Example : heights,
weights, body
temperatures
• Qualitative variables are
variables that have
distinct categories
according to some
discrete continuous
characteristic or
attribute • Discrete variables assume values that can
• Example : gender (male, be counted.
female), place of birth • Continuous variables can assume infinite
number of values between any two
specific values. They are obtained by
measuring, often include fractions and
decimals.
ratio
interval
ordinal
nominal
• The nominal level of measurement classifies data into
Nominal
mutually exclusive(nonover-lapping) categories in
which no order or ranking can be imposed on the data
• Example : marital status(single, married, divorce),
political party (BN, PR)
• The ordinal level of measurement classifies data into
Ordinal
categories that can be ranked;however , precise
differences between the ranks do not exist.
• Example : exam grade(A,B,C,D,F), level of education
(degree, master, phD)
• The interval level of measurement ranks data,and
Interval
precise differences between units of measure do exist;
however, there is no meaningful zero.
• Example : IQ score, temperature
• The ratio level of measurement possesses all the
characteristics of interval measurement, and there
Ratio exist a true zero. In addition, true ratios exist
when the same variable is measured on two different
members of the population.
• Example : height, weight
Zip code
IQ score
Judging(first place, second place,etc.)
Major field(mathematics,computers,etc.)
Salary
Gender(male,female)
Ranking of tennis players
Age
Eye color(blue, brown, green, hazel)
Rating scale(poor,good,excellent)
Nominal Ordinal Interval Ratio
Identify
population
What is Sampling frame – the list
of all population
sampling? elements or sampling
Establish units
sampling Sampling unit – the unit
frame listed in the sampling
frame
Sampling techniques –
Sampling is the scientific Specify the scientific methods of
procedure of selecting a sampling selecting
techniques representative
sample from a population. samples from
populations
Determine
the sample
size
Select the
sample
Convenience Judgmental
sampling sampling
non- Simple
probability random Systematic
sampling sampling sampling
techniques
probability
sampling
Quota techniques
Snowball
sampling sampling
Multistage
Stratified
sampling
sampling
Note :
• Probability sampling is a sampling technique in which cluster
the probability of getting any particular sample may be sampling
calculated.
• Probability sampling uses random selection, while non-
probability sampling does otherwise.
• Non-probability sampling does not rely on probability
theory
Sampling Techniques Description
Simple Random Sampling A simple random sample is a sample in which all members of the population have an
equal chance of being selected
Random samples are selected by using chance methods or random number.
Chance method :
• Number each subject in the population
• Place numbered cards in a bowl, mix thoroughly, select as may cards as needed
• Subjects whose numbers are selected constitute the sample
Random number
• Using table
• Using computer
Process :
• Obtain a sampling frame with unique number
• Choose one by one(with or without replacement) until achieves n – this is performed
by generating random numbers
Systematic sampling A systematic sample is a sample obtained by selecting every kth member of the
population where k is a counting number.
Process :
• Get a sampling frame and sort alphabetically
• Number the units in the population from 1 to N
• Find the interval, k ; k = N/n
• Select the first sample, r at random (r is selected from 1-k)
• Take every kth unit
Example :
Twenty students from part 1 CS111 which consists of 100 students is to be selected using
systematic sampling. Describe how to select the sample.
Sampling Techniques Description
Stratified Random Sampling A stratified sample is a sample obtained by dividing the population into subgroups
or strata according to some characteristics relevant to the study. Then subjects are
selected from each subgroup.
Process:
• Divide the population into strata
• Select samples for each group randomly(by using simple random sampling or
systematic sampling)
Example :
A population may consists of males and females who are smokers or non-smokers. The
researcher will want to include in the sample people from each group – that is, males
who smoke, males who do not smoke, females who smoke, and females who do not
smoke. The researcher divides the population into 4 subgroups and then selects a
random sample from each subgroup. This method ensure that the sample is
representative on the basis of the characteristics of gender and smoking.
N – Population size
n – sample size
Sample size for each group = (population
size for each group/N) X n
Star Cruises, the Leading Cruise Line in Asia –Pacific
offers special rates to Malaysian senior citizens. The
agency wishes to conduct a survey on the ages of this
group of customers. A sample 150 senior citizens who
took the offer is selected at random.
a) Explain the population for this survey
b) State the sampling frame
c) Describe the variable of interest and state its type.
d) Identify the most appropriate sampling technique
and describe how it should be carried out.
e) Give the best data collection technique for this
survey and state the advantage(s) and
disadvantage(s).
Parking at College Excellence has become a very big problem
as the number of students now is reaching 1000. college
administrators are interested in determining the average
parking time(the time it takes a student to find a parking
spot) of its students. An administrator inconspicuously
followed 250 student and carefully recorded their parking
times.
a) Identify the population of interest to the college
administration.
b) State the variable to be measured and name the type of
variable.
c) What is the possible sampling frame for this study? If
systematic sampling method is to be used, explain how a
sample of 250 students can be selected.
d) What is the data collection technique employed in the
study? State ONE disadvantage of using this data
collection method.
MAC15
Mutiara Pharmacy is planning to conduct a survey to determine the effectiveness of its new health
product known as AA Plus as a dietary supplement. A sampling frame consisting of 2000 names ,
addresses, ages and telephone numbers of its clients is used for selecting a sample of 100 clients.
From the sampling frames, the clients can be divided into three different subgroups according to age
as summarized in the table below.
Age Group Number of Clients
Less than 40 years old 435
Between 40 to 50 years old 665
More than 50 years old 900
Total 2000
a) State the population and sample of the study.
b) State the variable in this study and determine its type.
c) Suggest the most suitable sampling method and state one of its advantages.
d) Calculate the sample size for each group.
e) Describe how to select the clients among those aged more than 50 using systematic sampling
technique.
f) Suggest a suitable data collection method and state one reason for using this method.
g) Give TWO (2) reasons why sampling is preferred compared to census in this survey.
A researcher wishes to conduct an English newspaper readership survey based on race in a town with a
population of 100,000. the population breakdown is: 50% Chinese, 30% Malay, 18% Indian and 2%
others. A random sample of 500 will be selected from this study.
a) State the population of interest for this study.
b) Describe the sampling frame.
c) Recommend an appropriate sampling technique to be used in this study, and describe how it should
be carried out.
d) State a suitable data collection method for this study, give one advantage and one disadvantage.
e) Suppose the researcher is unable to locate a sampling frame, suggest an alternate sampling
method. Briefly describe the sampling method suggested.
Sampling Techniques Description
Cluster Sampling A cluster sample is obtained by dividing the population into sections or clusters
and then selecting one or more clusters and using all the members in the
cluster(s) as the members of the sample.
Cluster sampling is used when the population is large or when it involves subjects
residing in a large geographical area.
Process:
• Divide the population into clusters
• Randomly sample clusters
• Measure all the subjects within sampled clusters
Example :
• Suppose a researcher wishes to survey apartments dwellers in a large city. If there are
10 apartments buildings in the city, the researcher can select at random 2 buildings
from the 10 and interview all the residents of these buildings.
• If one wanted to do a study involving the patients in the hospitals in New York City, it
would be very costly and time-consuming to try obtain a random sample of patients
since they would spread over a large area. Instead, a few hospitals could be selected
at random, and the patients in these hospitals would spread over a large area.
Instead, a few hospitals could be selected at random, and the patients in these
hospitals would be interviewed in a cluster.
A large firm is considering to implement a new salary scheme and
wishes to determine the proportion of employees that agree with
the new policy. The firm has 20 branches located throughout
Malaysia. A sample of five branches was selected and the opinion of
all employees regarding the new scheme were obtained.
a) Describe the population and sample for the study.
b) Explain the sampling frame for the study.
c) Identify the type of variable and the scale of measurement
used in the study.
d) Describe the sampling technique used in the study.
e) If systematic sampling technique was employed to select five
branches from 20, explain how it will be conducted.
f) Determine the most appropriate data collection method to be
used in the study, give one advantage and one disadvantage of
using this method.
For Exercise 1-2, identify the sampling method that was used.
1. To check the accuracy of a machine that is used for filling ice cream containers,
every 20th bottle is selected and weighed.
2. In a large school districts, a researcher numbers all the full-time teachers and
then randomly selects 30 teachers to be interviewed
For Exercise 1-2, classify each sample as random, systematic, stratified ,
cluster or other.
1. In a large school district, all teachers from two buildings are interviewed to
determine whether they believe the students have less homework to do now than
in previous years.
2. Mail carriers of a large city are divided into four groups according to gender(male
or female) and according to whether they walk or ride on their routes. Then 10 are
selected from each group and interviewed to determine whether they have been
bitten by a dog in the last year.
Non- Probability Sampling Description
Technique
Convenience Sampling As it name implies – the researcher selects the respondents at
his own convenience. the respondents are selected because
they happen to be in the right place at the right time where
the researcher is conducting survey.
Example : the researcher is conducting a survey at the
entrance of a shopping complex at 10 am in the morning; the
customers who arrive there at 10 am will be selected as
respondents
Judgmental Sampling Procedure of selecting respondents for research solely based
on the judgement of the researcher.
The researcher selects a respondent whom (from his
judgement) he feels possesses certain characteristics that
represent the population of interest.
Example : a researcher is doing a study on illegal traders who
sell pirate CD/DVD on the streets. He will go to the street and
look for the respondents – based on certain characteristic
namely “illegal traders” and pirate CD/DVD” ( the
characteristics of “illegal trader” and pirate CD/DVD” depend
on the experience of a researcher such as not having proper
shop; selling items at low price and looked suspiciously all the
time
Non- Probability Description
Sampling Technique
Snowball sampling The best method when the respondents are
difficult to identify or have to meet certain
criteria to participate
Find one person qualifies to participate, ask
him/her to recommend several other people who
have the knowledge you are looking for and
participant list can grow from there.
Examples : study on drug users, study on the
victims of “quick-rich” schemes
Quota Sampling Procedure of selecting respondents who possess
certain characteristics determined by study.
The characteristics of respondents could be their
gender,etc.
Face-to-face interview
Telephone Interview
(personal interview)
Mail or postal
Direct questionnaire
questionnaire
(questionnaire
(questionnaires are
distributed and
sent and received
collected personally)
back through the post)
Other methods
Direct observations
(electronic e-mail,
(respondents are
internet survey,
observed and data
WhatsApp
recorded)
Application).
Indesigning questionnaire, the following
points should be taken into consideration.
Question should be short and simple
Begin with simple and Less controversial question
Avoid sensitive question
Question should not be biased
Thank you
-refer text book-