Introduction to Statistics
1. Statistics is used in everyday activities and in many different areas such as:
a. Research e. Banking
b. Weather f. Finance
c. Medicine g. Natural and Social Sciences
d. Politics
2. Scientific Process of Statistics
a. Data Collection
b. Data Organization
c. Data Analysis
d. Data Presentation
e. Data Interpretation
3. Two Types of Statistics
a. Descriptive Statistics – collection, organization, summary, and presentation
of data.
– is the term given to analyze data that helps describe,
show, or summarize data in a meaningful way such that patterns might emerge
from the data. Does not allow conclusions beyond the data that have been
analyzed or reach conclusions regarding any hypothesis that might have been
created.
b. Inferential Statistics – generalizing, concluding, and prediction from sample
data.
– instead of using the entire population to gather the
data, the statistician will collect a sample to represent the population.
i. Sample – a portion of the population that is representative of the
population from which it was selected.
ii. Population – represents all possible elements or outcomes that are of
interest in a particular study.
4. Types of Variables
a. Quantitative Variable– any data than can be countered or measured;
numerical data.
i. Discrete Data – refers to values that are countable and can only take
on specific, separate values. No fractions or decimals (sales).
ii. Continuous Data – can be measured and divided infinitely. Any value
within a given range and can include fractions and decimals (grade).
b. Qualitative Variable– any data that can be observed but not measured;
descriptive data.
i. Nominal Data – categories that have no intrinsic order or ranking
ii. Ordinal Data – categories that do have a meaningful order or ranking.
iii. Dichotomous Data - categorical data that has only two distinct
categories or options (Y/N, T/F, M/F).
5. Sampling – is a selection of samples representing total pop’n to the highest degree.
a. Random/Probability Sampling Technique – a random sample is drawn in a
way that each element of the population has an equal chance of being selected.
i. Simple Random Sampling Technique – sample has an equal chance
of being selected (lottery method, fish-bowl technique).
ii. Systematic Sampling Technique – ordered and systematic way of
selecting the sample (every 10th).
iii. Stratified Sampling Technique – pop’n divided into different groups,
from which sample is chosen randomly.
iv. Cluster Sampling Technique – pop’n divided into clusters, from
which sample is chosen randomly from selected clusters.
b. Non-Random/Probability Sampling Technique – a random sample is drawn
where there is no equal chance of it to be selected.
i. Purposive Sampling – most commonly used non-random sampling
technique.
– samples are chosen based on a pre-determined
research criterion, characteristics of pop’n and objective of the study.
ii. Quota Sampling – researchers collect of information from a pre-
determined number of samples based on their choice of a specific
subgroup (20% male, 20% female).
iii. Convenience Sampling – samples chosen based accessibility. Time-
consuming but more subjective compared to other techniques.
iv. Snowball Sampling – subsequent respondents are selected based on
the referral of previous respondents (recruitment).
Unit 1. Data Collection
1. Data Collection – process of gathering data systematically via valid and reliable tools.
– most essential stage in conducting research.
2. Steps in Data Collection
a. Select the appropriate data collection method.
b. Use valid and reliable data gathering method.
c. Gather data systematically.
3. Commonly used Methods of Data Collection
Method of
Tool/Instrument
Data Description
Used
Collection
Group of respondents’ answers items Questionnaire
Survey
in the instrument using pen and paper
One on one discussion between Interview
Interview
interviewer and interviewee guide/protocol
Focused Group Dynamic group discussion between Interview guide
Discussion group of people and interviewer
To see or notice someone or Observation
Observation
something guide
Culling data from information from Data collection
Documentary desired sources like existing form
Analysis documents, records or archival
sources
A test under controlled conditions Observation
made to demonstrate a known truth to guide
Experiment examine validity of a hypothesis or
determine the efficacy of something
4.
untried.
Sources of Data
a. Primary data – data collected by the researchers.
b. Secondary data – data that already exists
c. Triangulation/Dual Methodology – use of both primary and secondary data.
5. Qualities of a Good Instrument/Tool
a. Using Quantitative Research
i. Validity – extent or degree to which the instrument measures what it is
supposed to measure.
Content Validity – assesses whether a test is representative of
all aspects of the construct. Must cover all relevant parts of the
subject it aims to measure.
Face Validity – considers how suitable the content of a test
seems to be on the surface.
Construct Validity – ensuring that the method of measurement
matches the construct the researchers want to measure.
Criterion Validity – how closely the results of the test
correspond to the results of a different test.
ii. Reliability – the consistency of responses.
Test-retest Reliability – consistency overtime.
o Parallel form technique
o Split half
Internal-consistency method – across items.
o Kuder-Richardson formula
o Cronbach’s Alpha
Inter-rater reliability – across different researchers.
Unit 2. Data Organization
1. Raw Data – When data is collected, information from each member of the population
or sample is recorded in the sequence in which it becomes available. The sequence of
data is random and unranked
a. Quantitative Raw Data
b. Qualitative/Categorical Raw Data
2. Commonly Used Methods of Raw Data
a. Ungrouped Frequency Distributions – a representation, either in a graphical
or tabular format, that displays the number of observations within a given
interval.
i. Variable – categories in the frequency distribution table.
ii. Frequency (f) – number of values in a certain category.
iii. Frequency Table – how frequencies are distributed over various
categories.
b. Grouped Frequency Distributions
i. Class – interval that includes all the values that fall within two
numbers. (lower limit, upper limit of the nth class)
ii. Frequency – number of values that belong to different classes.
3. Constructing Ungrouped Frequency Distribution Table
a. Set the values of data called scores, in the column from lowest to highest
value or vice versa;
b. Create a second column with frequency of each data known as the tally of the
scores;
c. Create a third column where relative frequency of each score is inserted;
i. Relative Frequency (RF) – dividing the frequency by the sum of all
frequencies. To check correctness of calculations, sum of RF should be
=1
d. Fourth column with relative frequency performed in percentages;
e. Fifth column is cumulative frequency (CF) column.
4.
Constructing Grouped Frequency Distribution Table
a. Determine number of classes (c) – can be arbitrarily made by the researchers
of by using the Sturge’s formula. Preferable to have more classes as the size of
data increases.
b. Arrange data in ascending order
c. Determine the range (R) – highest value less smallest value in the data set.
d. Determine class width/interval (i) – range/number of classes.
Unit 3. Data Presentation
1. Methods of Presenting Data
a. Textual Presentation – gives emphasis to significant figures and
comparisons. Simplest and most appropriate approach when there are only a
few numbers to be presented.
b. Tabular Presentation – a systematic organization of data in rows and
columns. A more concise and easier to understand than textual presentation
facilitating comparison and analysis of relationship among different
categories.
i. Array – a matrix of columns of numbers arranged in ascending order
which includes number of observations, minimum and maximum
observations, median, mode.
ii. Simple Tables – column heading and names of involved variables.
iii. Compound Tables – an extension of a simple table where more than
one variable is distributed among its sub-variable.
c. Graphical Presentation
i. Line Graph – showing trends over a period of time by illustrating
frequences and various values of a variable.
Single Line Graph
Multiple Line Graph
ii. Pie Graph – a circular graph showing how a total quantity is
distributed among a group of categories with pieces of the pie
representing proportions of the total that fall in each category.
iii. Bar Graph – series of rectangular bars where the length of the bar
represents the quantity/frequency for each category with the bars
arranged horizontally/vertically.
Simple Bar Chart
Multiple Bar Charts
iv. Pictograph – a pictorial chart with each symbol representing a definite
and uniform value.
v. Ogive
vi. Steam-leaf Display – only presents quantitative data in condensed
form where “leaves” for each “stem” are shown separately in a display.
vii. Histogram – a diagram consisting of rectangles whose area is
proportional to the frequency of a variable and whose width is the class
interval.
Symmetric Histogram – identical on both sides of its central
point.
Skewed Histogram – outlier.
Uniform/Rectangular Histogram – same f for each class.
viii. Polygon/Frequency Polygon
ix. Frequency Distribution Curve – useful for big data as number of
classes increase and width decreases, frequency polygon becomes a
smooth curve.
x. Box Plot/Box and Whisker Diagram – min. and max. value giving
information about the distribution and spread of data.
xi. Scatter Plot – uses Cartesian coordinates to display values for two
data variables of a set of data.
Unit 4. Measures of Central Tendency
1. A value describing a set by identifying the central position to within that set of data
also known as measures of central location.
2. Mean – sum of all values / number of values in the data set.
– formula for mean of grouped data is: Sum of Fx/Sum of F
3. Median – middle value for a set of data that has been arranged in order of magnitude.
4. Mode – most often or frequent occurring value in the data set. There can be no mode.