0% found this document useful (0 votes)
90 views19 pages

Data Analysis Planning Guide

The document outlines the process of planning data analyses using statistics, emphasizing the importance of data quality and a structured analysis plan. It describes various data analysis strategies, including exploratory, descriptive, and inferential data analysis, along with different levels of measurement scales. Additionally, it explains key statistical measures such as mean, median, and mode, providing examples for better understanding.

Uploaded by

jerwincoc04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views19 pages

Data Analysis Planning Guide

The document outlines the process of planning data analyses using statistics, emphasizing the importance of data quality and a structured analysis plan. It describes various data analysis strategies, including exploratory, descriptive, and inferential data analysis, along with different levels of measurement scales. Additionally, it explains key statistical measures such as mean, median, and mode, providing examples for better understanding.

Uploaded by

jerwincoc04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

LESSON 5

PLANNING DATA
ANALYSES USING STATISTICS

A R A BELA S . D O MI N GO
MA R Y JA N E MA G B A NUA
INTRODUCTION
When the necessary data have already been collected,the next step is to
organize the raw data for data analysis. It is important that the researcher is
assured of the quality of the data for accuracy , consistency, completeness and
systematic arrangement to facilitate coding and tabulation.

Every research methodology requires a data analysis plan.The plan


includes specifying the statistical measures to use and to address the research
questions. The appropriate methods of data analysis are determined by the
type of data,the variables to be used, the number of cases and the distribution
of the variables.
PURPOSE OF DATA ANALYSIS PLAN
The purpose of a data analysis plan is to gather useful information to find
solutions to research questions of interest.It may be used to:

-Describe data sets;


-determine the degree of relationship of variables;
-determine differences between variables;
-predict outcomes;and
-compare variables.

All of the above could be manipulated by using any or a combination of


the following data analysis strategies:
EXPLORATORY DATA ANALYSIS
This type of data analysis is used when it is not clear what to expect from the data. This
strategy uses numerical and visual presentations such as graphs. Since the research of interest
is new, it is possible to find some inconsistencies, such as missing values, distribution of the
data or unusually small or too large values or invalid data.
Descriptive Data Analysis
This type of data analysis is used to describe, show or summarize data in a meaningful way,
leading to a simple interpretation of data. Descriptive data analyses do not allow you to
formulate conclusions beyond the data that you have described. The commonly used
descriptive statistics are those that analyze the distribution of data such as frequency ,
percentage , measures of central tendency and measures of dispersion.
Inferential Data Analysis
Inferential statistics tests hypotheses about a set of data to reach conclusions or make
generalizations beyond merely describing the data. Inferential statistics include tests of
significance of difference such as the t-test , Analysis of Variance (ANOVA);and tests of
relationship such as Product Moment Coefficient or Correlation or Pearson r , Spearman rho ,
linear regression and Chi-square test.
Quantitative Analysis in Evaluation
Determining the level of measurement of the quantitative data is important before
proceeding with analysis of data. The choice of statistical measure/s to use is dependent on the
level of the measurement of the data.The following are the levels of measurement scales:
> Nominal Scale
> Ordinal Scale
> Interval Scale
> Ratio Scale
Nominal Scale
A nominal scale of measurement is used for labelling variables. It is sometimes called
categorical data. The Yes or No scale is an example of nominal data. The numbers assigned to
the variables have no quantitative value. Some examples of variables measured on a nominal
scale are gender, religious affiliation , race or ethnic group.
Ordinal Scale
An ordinal scale of measurement assigns order on items on the characteristics being measured. It
involves the ranking of individuals, attitudes and characteristics.
Numerical scores such as first, second , third and so on are assigned but the numerical value or quantity
has no value except its ability to establish ranking among a set of data. You can talk about ordering , but
differences in order between the ranks are not specified.

Interval Scale
The interval scale has equal units of measurement,thereby, making it possible to interpret the order of
the scale scores and the distance between them. However, interval scale do not have a “true zero” .
With interval data, addition and subtraction are possible but you cannot multiply or divide.

Ratio Scale
Ratio scale is considered the highest level of measurement. It has the characteristics of an interval scale
but it has a zero point. Because of this property, all statistical operations can be performed on ratio scales.
All descriptive and inferential statistics may be applied. All variables can be added,subtracted ,multiplied
and divided.
Descriptive Data Analysis
Suppose, senior high school students were asked how many hours they spent
on the computer, and in what subject they often used the computer for.
Results of the survey could indicate that on the average, the senior high
school students spent two (2) or more hours with a range of one (1) to
four(4) hours. A typical senior high school student spent more than two
hours studying his/her research subject using the computer.

In the above example, the findings are presented as averages The use of the
phrase “on the average” and the word “typical” denote that one is interested
to determine the center or middle of a set of data.
The common measures of central tendency, sometimes called measures of
location or center, include the mean,median and mode.
Mean
Often called the arithmetic average of a set of data, the mean is the sum of the observed values in the distribution
divided by the number of observations. It is frequently used for interval or ratio data. The symbol X ( x bar) is used
to denote the arithmetic mean.
The mean is calculated by summing up the observations (items, height ,scores or responses ) and dividing by the
number of observations.

The Formula is

The following examples show the calculation of the mean for ungrouped data, that is a list of data that is not
recognized in any way.
A.For Ungrouped Data

Example 1:

Find the mean of the measurement

18, 26, 27, 29, 30

Solution:

Substitute the measurements using the formula.

Note that the value of the , falls near the middle of the data set.

Answer : X is 26

Suppose the 3rd measurement was 17(rather than 27). The mean would be ¹²⁰/s ≈24

Thus, the mean is changed when one of the values in the set of observation is changed.

Example 2:
B. For Grouped Data
When the observations are grouped into classes, the formula grouped data is as follows :

1.2 The Weighted Mean


The weighted average or weighted mean is necessary in some situations. Suppose that you are
given the means of two or more measurements and you wish to find the mean of all the
measures combined into one group. The formula for the weighted mean is given by:

Where:
f=frequency
x=numerical value or item in a set of data
n=number of observations in the data set
Example 1:
Find the mean of the heights of 50 senior high school students summarized as follows:

Solution:
Using the above data, the weighted mean is equal to the sum of the column fx , divided by the
total number of observation.

When the data is grouped into classes, the class midpoint represents the “X” in the formula.
Example 2:
1.3 Median
The median is the midpoint of the distribution. It represents the point in the data where 50% of the values fall
below that point and 50% fall above it. When the distribution has an even number of observations, the median is the
average of the two middle scores. The median is the most appropriate measure of central tendency for ordinal data.
A. For Ungrouped Data
The median may be calculated from Ungrouped Data by doing the following steps:
1. Arrange the items(scores , responses, observations) from lowest to highest.
2.Count to the middle value.For an odd number of values arranged from lowest to highest, the median corresponds
the value. If the array contains an even number of observations, the median is the average of the two middle values.
Example 1:
Consider these odd numbers of numerical values:
7, 8, 8, 9, 10, 12, 23
By inspection, the median is 9 because half of the values (7, 8, 8 ) are below 9 and half (10,12 ,23) are above 9. Since
n= 7 is odd, the median has rank
Example 2:
Consider these even numbers of numerical values:
12, 15, 18, 22, 30, 32.
The two middle values are 18 and 22. If the average of the two middle numbers is taken, that is ,
18+ 22 = 40
40÷ 2 =20
Answer : The median is 20.
Example 3:
Find the median for the set of measurements
15, 20, 12, 26, 3, 30, 14
Solution:
We first rank the measurements from the smallest to the largest 3, 12, 14, 15, 20, 26, 30.Since the number of cases is odd, the
median has rank
Answer: The Median is 15
Suppose the last number is 32 (rather than 30), the median is still 15.Unlike the mean, the median is not affected by extreme
values in the distribution .
Suppose the last number is 32 (rather than 30), the median is still 15.Unlike the mean, the median is not affected by
extreme values in the distribution.
A .For Group Data
If the data are grouped into classes, the median will fall into one of the classes as the (n/2)th value. The process
involves several steps and has for its general formula the following:

Where:
L=exact lower limit of the class containing the median(median class)
i= interval size
n=total number of items or observations
F=cumulative frequency in the class preceding the median class
f=frequency of the median class
In the following example, the use of the step by step procedure will be illustrated:
Example 4:
The following data show the distribution of the ages of people interviewed for a survey on a topic about climate change.

Solution
(n/2)th = (100/2)th

L=30.5
n=100
F=34
f=22
i=10
Formula:

Solution:
=30.5+10(50-34/22)
=30.5+10(16/22)
=30.5+160/22
=30.5+7.27
=37.77
1.4 Mode
The mode is the most frequently occuring value in a set of observations.In cases where there is more than one
observation which is the highest but with equal frequency,the distribution is bimodal (with 2 highest
observations) or multimodal with more than two highest observations. In cases where every item has an equal
number of observations,there is no mode. The mode is appropriate for nominal data.
The mode or the modal score is a score or scores that occurred most
in the distribution.

It is classified as unimodal,bimodal,trimodal or multimodal.

Unimodal is a distribution of scores that consists of only one mode.


Bimodal is a distribution of scores that consists of two modes.
Trimodal distribution of scores that consists of three modes or
multimodal is a distribution of scores that consists of more than two
modes
Example 1:
The ages of fifteen persons assembled in a room are as follows.
16, 18, 18, 25, 25, 30, 34, 36 and 38

Solution:
An age of 25 is the mode because it has been recorded three times in the sample, more than any other age.
Answer: Mode =25
Example 2:
The number of hours spent by 10 students in an internet café was as follows:
2,2,2,3,3,4,4,4,5,5
Solution:
Both 2 and 4 have a frequency of 3. The data is therefore bimodal.
Answer: Mode =2 and 4
Example 3:
Referring to the data on the distribution of the ages of 100 people interviewed for a survey on a topic on national interest,
the model class is 31-40.The mode which corresponds to the class midpoint would be 31+40/2 =35.5 .

You might also like