Data Analysis
Data Analysis
Data Analysis
in Education
A Practical Guide for Students and Researchers
10 IN
Table of Contents
45 M
Unit 1: Introduction to Statistics
6
60 A
Statistics in Education
Importance of Statistics in Education
32 ER
Unit 2: Graphic Representation of Data
Histogram
03 IS
Polygon
Frequency Curve
A
Pie Chart/Graph
Q
Mean
Median
Mode
Range
Quartile Deviation
Standard Deviation
Data Analysis B.Ed (Hons) QAED KOT ADDU |2
Correlation
Normal Distribution
Percentiles & Percentile Ranks
Tests of Significance
Parametric Tests
Non-Parametric Tests
Nominal
Ordinal/Ranking
Interval
Ratio
10 IN
Random Sampling
Random Variables and Their Distribution
45 M
6
Binomial Distribution
60 A
Unit 8: Normal and Sampling Distributions 32 ER
Normal Distribution
Interpreting Scores in terms of Z-Scores and Percentile Ranks
Assumption-Free Tests
Mann-Whitney U Test for Two Independent Samples
Wilcoxon Test for Dependent Samples
10 IN
45 M
6
Unit 1: Introduction to Statistics
60 A
Learning Objectives
32 ER
By the end of this unit, students will be able to:
03 IS
The term Statistics comes from the Latin word status meaning "state." It originally
referred to data collected by governments, but today it is widely used in almost every
field.
Definition:
Statistics is the science of collecting, organizing, presenting, analyzing, and
interpreting data to assist in decision-making.
Data Analysis B.Ed (Hons) QAED KOT ADDU |4
✅ Example:
If we want to know the average marks of students in a class, we use statistics to calculate
it.
1. Descriptive Statistics
o Deals with collecting, summarizing, and presenting data.
o Examples: mean, median, mode, graphs, and charts.
2. Inferential Statistics
o Deals with drawing conclusions and making predictions based on data.
o Uses hypothesis testing, t-tests, ANOVA, chi-square, etc.
✅ Example:
10 IN
Inferential: ―If we repeated this test, we can expect similar averages in the
population.‖
45 M
6
60 A
1.3 Statistics in Education
32 ER
In education, statistics plays a vital role in:
✅ Example:
If the dropout rate in schools is 25%, statistics helps identify causes and propose
solutions.
Teachers use statistics for grading, classroom assessments, and student progress.
Researchers use statistics to test hypotheses, analyze surveys, and interpret
findings.
Administrators use statistics for decision-making in budgets, policies, and
planning.
10 IN
1.6 Limitations of Statistics
45 M
6
Statistics deals only with quantitative data, not qualitative aspects like emotions
60 A
or attitudes (unless measured by scales).
Misuse or misinterpretation of data may lead to wrong conclusions.
32 ER
Requires careful collection and analysis; otherwise, results may be biased.
03 IS
Imagine a teacher wants to check whether a new teaching method improves performance:
Q
Summary of Unit 1
Self-Assessment Questions
10 IN
Draw and interpret histograms, frequency polygons, frequency curves, and pie
charts.
Select appropriate graphs for different types of data.
45 M
6
Apply graphic representation in educational research and classroom contexts.
60 A
2.1 Introduction 32 ER
Numerical data in tables may be difficult to understand. Graphs and charts present data in
a visual form that makes interpretation easier.
✅ Example: A table showing exam scores may look complex, but a bar chart or pie chart
03 IS
2.2 Histogram
Q
Definition
A histogram is a bar graph that represents the frequency distribution of continuous data.
Features
✅ Example:
Marks scored by students in a test:
Data Analysis B.Ed (Hons) QAED KOT ADDU |7
0–10 5
10–20 8
20–30 12
30–40 10
40–50 5
A histogram will show 5 bars, each representing the frequency of students in each range.
10 IN
Definition
45 M
6
A frequency polygon is a line graph that shows the shape of a distribution.
60 A
Steps to Draw 32 ER
1. Draw a histogram first.
2. Mark the midpoints of each bar.
3. Connect the midpoints with straight lines.
03 IS
Use in Education
A
Q
Useful for comparing two or more distributions (e.g., boys vs. girls exam scores).
A frequency curve is a smooth curve drawn through the points of a frequency polygon.
Features
✅ Example: Students’ IQ scores often form a normal curve where most students are
average, a few are very high or very low.
A pie chart is a circular diagram divided into slices to represent proportions of a whole.
Steps to Draw
Angle=FrequencyTotal Frequency×360∘\text{Angle} =
\frac{\text{Frequency}}{\text{Total Frequency}} \times
360^\circAngle=Total FrequencyFrequency×360∘
10 IN
2. Draw a circle and divide it into sectors accordingly.
45 M
Example:
6
Survey of students’ favorite subjects:
Subject
60 AStudents
32 ER
English 20
03 IS
Math 30
A
Science 25
Q
History 15
Total 90
Summary of Unit 2
10 IN
Self-Assessment Questions
45 M
6
1. What is the difference between a histogram and a bar chart?
60 A
2. Explain the steps of drawing a frequency polygon.
3. Why is the normal curve important in educational research?
32 ER
4. Construct a pie chart for this data: Boys = 40, Girls = 60.
5. Give two advantages of using graphs in teaching statistics.
03 IS
A
Q
Data Analysis B.Ed (Hons) QAED K O T A D D U | 10
10 IN
45 M
6
60 A
32 ER
03 IS
A
Q
Data Analysis B.Ed (Hons) QAED K O T A D D U | 11
10 IN
45 M
6
60 A
32 ER
Here are four illustrative graphs that visually capture key concepts from Unit 2: Graphic
Representation of Data:
03 IS
Captions & Explanation: Each graph should have a caption and be referenced in
your book text—e.g.,
o Figure 2.1: Histogram showing score distribution,
o Figure 2.2: Frequency polygon illustrating score trends, etc.
Combined Examples—Present a dataset and show how it’s represented in each
graph type—it helps readers see how different visuals emphasize different data
aspects.
10 IN
DIY Activity—Include an exercise: Provide a small frequency table and ask
students to draw a histogram, then overlay a polygon, then convert into a pie
45 M
chart.
6
60 A
32 ER
Unit 3: Measures of Central Tendency
Learning Objectives
03 IS
3.1 Introduction
In statistics, we often deal with large sets of numbers. To make sense of the data, we need
a single representative value that describes the ―center‖ of the data.
✅ Examples:
Data Analysis B.Ed (Hons) QAED K O T A D D U | 13
The three most commonly used measures are: Mean, Median, and Mode.
Definition
The mean is the sum of all values divided by the number of values.
Formula
Mean (Xˉ)=∑XN\text{Mean (} \bar{X} \text{)} = \frac{\sum X}{N}Mean (Xˉ)=N∑X
10 IN
Where:
45 M
NNN = Number of values
6
60 A
Example (Raw Data): 32 ER
Marks: 10, 20, 30, 40, 50
0–10 5 4 20
10–20 15 6 90
20–30 25 10 250
30–40 35 5 175
3.3 Median
Definition
The median is the middle value when the data is arranged in order.
Example:
10 IN
Data: 5, 7, 8, 10, 12
Median = 8 (middle value).
45 M
Data: 5, 7, 8, 10, 12, 15
6
Median = (8+10)/2 = 9.
60 A
Formula (Grouped Data):
32 ER
Median=L+(N2−CFf)×h\text{Median} = L + \left(\frac{\frac{N}{2} - CF}{f}\right) \times
hMedian=L+(f2N−CF)×h
Where:
03 IS
✅ Use in Education: Median income of families; shows the central tendency without
being affected by extreme values.
3.4 Mode
Definition
Data: 2, 4, 4, 5, 6 → Mode = 4
Data: 10, 15, 20, 20, 25, 25, 25, 30 → Mode = 25
Where:
10 IN
✅ Use in Education: Most common grade obtained in an exam.
45 M
6
3.5 Comparison of Mean, Median, and Mode
60 A
Measure Advantages Limitations
32 ER Best Use
Summary of Unit 3
Self-Assessment Questions
10 IN
45 M
6
Unit 4: Measures of Dispersion
60 A
Learning Objectives
32 ER
By the end of this unit, students will be able to:
03 IS
So far, we have learned how to calculate central tendency (mean, median, mode). But
these measures do not tell us how spread out the data is.
✅ Example:
Although both classes have the same mean, the spread of marks is very different.
This spread is measured by dispersion.
Data Analysis B.Ed (Hons) QAED K O T A D D U | 17
Dispersion = the degree to which values deviate from the central value.
4.2 Range
Definition
The range is the difference between the largest and smallest value.
Formula
Range=L−S\text{Range} = L - SRange=L−S
Where:
10 IN
Example
45 M
6
Data: 15, 18, 20, 25, 30
Range = 30 – 15 = 15
60 A
✅ Use in Education: Quick measure of score variation in a test.
32 ER
Limitations: Only considers extreme values, ignores all others.
03 IS
Definition
Q
Quartile Deviation (Q.D.) measures the spread of the middle 50% of data.
Formula
Q.D.=Q3−Q12Q.D. = \frac{Q_3 - Q_1}{2}Q.D.=2Q3−Q1
Where:
Example
Definition
Standard Deviation is the most important and widely used measure of dispersion. It
shows the average deviation of each value from the mean.
10 IN
Where:
45 M
6
XXX = each value
Xˉ\bar{X}Xˉ = mean
60 A
NNN = number of values 32 ER
Example
Data: 2, 4, 6
Xˉ=4\bar{X} = 4Xˉ=4
03 IS
✅ Use in Education:
S.D. Most reliable, uses all data Complex calculation Research, exam analysis
10 IN
✅ Example:
45 M
6
Class A mean = 70, SD = 5 → scores are consistent.
Class B mean = 70, SD = 20 → scores vary widely.
60 A
32 ER
Summary of Unit 4
03 IS
Self-Assessment Questions
5.1 Introduction
10 IN
In educational research, we often want to know whether two or more variables are
related. For example:
45 M
6
Does study time affect exam performance?
60 A
Is there a relationship between attendance and grades?
32 ER
Measures of Relationship help us answer such questions scientifically.
03 IS
5.2 Correlation
A
Definition
Q
Types of Correlation
Definition
The normal distribution is a bell-shaped curve that shows how scores are distributed
around the mean.
Characteristics
10 IN
Symmetrical about the mean.
Mean = Median = Mode.
45 M
68% of data lies within 1 SD, 95% within 2 SD, and 99.7% within 3 SD.
6
60 A
✅ Educational Use:
Standardized testing.
32 ER
IQ distribution (most students average, few very high/low).
Grading on a curve.
03 IS
Definition
Formula
Pk=L+(kN100−CFf)×hP_k = L + \left(\frac{\frac{kN}{100} - CF}{f}\right) \times hPk
=L+(f100kN−CF)×h
Where:
Statistical tests help us decide whether the observed results are due to chance or a real
relationship.
Parametric Tests
10 IN
Examples: t-test, z-test, ANOVA.
Used for interval/ratio scale data.
45 M
Non-Parametric Tests
6
60 A
Do not assume normal distribution.
Examples: Chi-square, Mann-Whitney U test.
32 ER
Used for nominal/ordinal data.
✅ Educational Example:
03 IS
Summary of Unit 5
Self-Assessment Questions
10 IN
Unit 6: Measurement Scales
45 M
Learning Objectives
6
60 A
By the end of this unit, students will be able to:
32 ER
Define and differentiate between the four scales of measurement.
Identify examples of nominal, ordinal, interval, and ratio data.
Understand the importance of measurement scales in educational research.
Select appropriate statistical techniques based on measurement scales.
03 IS
A
6.1 Introduction
Q
1. Nominal
2. Ordinal
3. Interval
4. Ratio
Each scale provides different levels of information and determines which statistical
methods can be used.
Data Analysis B.Ed (Hons) QAED K O T A D D U | 24
Definition
Examples
Educational Use
10 IN
Classifying students into sections (A, B, C).
Recording types of learning styles (Visual, Auditory, Kinesthetic).
45 M
✅ Statistics Used: Mode, Chi-square.
6
60 A
6.3 Ordinal Scale
32 ER
Definition
03 IS
Data are arranged in order or rank, but the difference between ranks is not equal.
A
Examples
Q
Educational Use
Data are measured on a scale with equal intervals but no true zero point.
Examples
Educational Use
10 IN
45 M
6.5 Ratio Scale
6
60 A
Definition 32 ER
Highest level of measurement.
Data have equal intervals and a true zero point.
Examples
03 IS
Educational Use
✅ Statistics Used: All statistical techniques, including geometric mean and coefficient of
variation.
Data Analysis B.Ed (Hons) QAED K O T A D D U | 26
Interval Equal intervals, no true zero IQ, Temperature Mean, SD, t-test, ANOVA
Ratio Equal intervals, true zero Age, Marks, Height All statistical methods
10 IN
Ensures accurate interpretation of student data.
Provides clarity in research methodology.
45 M
✅ Example:
6
60 A
If student scores are recorded as "Pass/Fail," that is Nominal.
If ranked 1st, 2nd, 3rd → Ordinal.
32 ER
If measured on a test with equal intervals (0–100) → Interval/Ratio.
03 IS
Summary of Unit 6
A
Self-Assessment Questions
10 IN
45 M
6
7.1 Random Sampling
60 A
Definition 32 ER
Random sampling is a method in which each member of a population has an equal
chance of being selected.
from each.
4. Cluster Sampling – Dividing into clusters, then randomly selecting clusters.
Types
Definition
10 IN
Properties
45 M
1. Probabilities are between 0 and 1.
6
2. The sum of all probabilities = 1.
60 A
✅ Example: Probability distribution of tossing a coin:
32 ER
Outcome Probability
Head 0.5
03 IS
Tail 0.5
A
Q
Definition
Formula
P(X=r)=(nr)prqn−rP(X = r) = \binom{n}{r} p^r q^{n-r}P(X=r)=(rn)prqn−r
Where:
Example
If a student has a 0.6 probability of passing a test, find the probability that he passes
exactly 2 times in 3 attempts.
10 IN
Random Sampling ensures fairness in educational surveys.
Random Variables model outcomes like test scores.
45 M
6
Probability Distributions help in predicting student performance.
Binomial Distribution is useful for analyzing yes/no outcomes, e.g., pass/fail,
60 A
correct/incorrect answers. 32 ER
Summary of Unit 7
03 IS
Self-Assessment Questions
10 IN
Definition
The normal distribution is a bell-shaped curve that describes how data values are
45 M
6
distributed around the mean.
60 A
Characteristics 32 ER
1. Symmetrical about the mean.
2. Mean = Median = Mode.
3. 68% of values lie within 1 SD, 95% within 2 SD, 99.7% within 3 SD.
4. Total area under the curve = 1 (100%).
03 IS
Example (Education):
A
In a large class, most students score around the average mark, fewer score very high or
Q
Definition
A Z-score indicates how many standard deviations a value is from the mean.
Formula
Z=X−XˉSDZ = \frac{X - \bar{X}}{SD}Z=SDX−Xˉ
Where:
Data Analysis B.Ed (Hons) QAED K O T A D D U | 31
Example
✅ Student scored 1.5 SD above the mean, better than about 93% of students.
Definition
10 IN
The percentile rank tells us the percentage of scores below a particular score.
45 M
Example: If a student is in the 80th percentile, they performed better than 80% of
6
students.
60 A
✅ Educational Use: Used in standardized tests (e.g., SAT, GRE) to compare
performance.
32 ER
03 IS
Definition
Q
Summary of Unit 8
10 IN
Self-Assessment Questions
45 M
6
1. Define normal distribution and list its characteristics.
60 A
2. What does a Z-score of –2.0 mean in exam results?
3. A student scored in the 90th percentile. What does this indicate?
32 ER
4. Explain the Central Limit Theorem in your own words.
5. Why are sampling distributions important in educational research?
03 IS
A
Learning Objectives
Statistics not only describes data but also allows us to make inferences about
populations based on samples.
1. Hypothesis Testing
2. Confidence Intervals
10 IN
Steps in Hypothesis Testing
45 M
o Null Hypothesis (H₀): No difference or no effect.
6
o Alternative Hypothesis (H₁): There is a difference/effect.
60 A
2. Set significance level (α): Common values = 0.05 or 0.01.
3. Select appropriate test (z-test, t-test).
32 ER
4. Compute test statistic.
5. Make decision: Reject or fail to reject H₀.
03 IS
Definition
Q
Used when we want to test whether the mean of a sample differs from a known
population mean.
Formula
t=Xˉ−μsnt = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{n}}}t=nsXˉ−μ
Where:
Example
A sample of 25 students scored a mean of 68 on a math test. The population mean is 70,
and sample SD = 10. Test at 0.05 level.
10 IN
Definition
A confidence interval (CI) provides a range of values within which the population mean
45 M
6
is likely to fall.
60 A
Formula
CI=Xˉ±Z⋅σnCI = \bar{X} \pm Z \cdot \frac{\sigma}{\sqrt{n}}CI=Xˉ±Z⋅nσ
32 ER
or if population SD unknown:
Example
A
Formula
z=p^−ppqnz = \frac{\hat{p} - p}{\sqrt{\frac{pq}{n}}}z=npqp^−p
Where:
Data Analysis B.Ed (Hons) QAED K O T A D D U | 35
Example
Suppose 60% of students nationally pass an exam. In a sample of 100 students from one
school, 70 passed.
10 IN
9.6 Educational Applications
45 M
6
Testing whether the average score of a class is equal to the national average.
60 A
Determining whether a new teaching method significantly changes test
performance.
32 ER
Evaluating whether the proportion of students passing is significantly different
from a benchmark.
03 IS
Summary of Unit 9
A
Hypothesis testing involves stating H₀ and H₁, setting α, calculating test statistic,
and making a decision.
One-sample t-test → Used when population SD unknown.
One-sample z-test → Used for large samples or known population variance.
Confidence intervals give a range of likely population values.
Self-Assessment Questions
Understand the need for ANOVA when comparing more than two groups.
Explain the basic concepts of between-group and within-group variance.
Conduct and interpret a one-way ANOVA.
Explain the principle of ANCOVA and its importance.
10 IN
Apply ANOVA/ANCOVA in educational research.
45 M
6
10.1 Introduction
60 A
When we want to compare the means of two groups, we can use a t-test.
32 ER
But what if there are three or more groups? Conducting multiple t-tests increases the
chance of error.
ANOVA tests whether there are significant differences among group means by
A
Between-group variance: How much the group means differ from the overall
mean.
Within-group variance: How much scores differ inside each group.
If between-group variance > within-group variance, the group means are significantly
different.
Data Analysis B.Ed (Hons) QAED K O T A D D U | 37
Where:
Steps
1. State hypotheses:
o H0:H_0:H0: All group means are equal.
o H1:H_1:H1: At least one group mean is different.
10 IN
2. Calculate SSB (Sum of Squares Between) and SSW (Sum of Squares Within).
3. Find MSB = SSB / dfB, MSW = SSW / dfW.
4. Compute F=MSB/MSWF = MSB / MSWF=MSB/MSW.
45 M
6
5. Compare with critical F-value.
60 A
32 ER
Example
A 70
Q
B 75
C 85
10 IN
Definition
45 M
6
It compares group means after controlling for the effect of a covariate.
60 A
Example 32 ER
Suppose two classes are taught with different methods, but one class has higher pre-test
scores. ANCOVA adjusts for pre-test scores before comparing post-test means.
Summary of Unit 10
Self-Assessment Questions
10 IN
Learning Objectives
45 M
6
By the end of this unit, students will be able to:
60 A
Define the Chi-Square test and its types.
32 ER
Apply the Chi-Square test for goodness of fit.
Use the Chi-Square test for independence.
Test equality of proportions with Chi-Square.
Understand the importance of Chi-Square in educational research.
03 IS
A
11.1 Introduction
Q
Many times, research data are in the form of frequencies (counts) rather than continuous
scores.
Examples:
Where:
Steps
10 IN
45 M
11.3 Chi-Square Test of Goodness of Fit
6
60 A
Definition 32 ER
Used to test if observed frequencies match expected frequencies.
Example
03 IS
A teacher believes students’ preferences for subjects are equally distributed (Math,
English, Science, Social Studies).
A
χ2=(25−25)225+(30−25)225+(20−25)225+(25−25)225\chi^2 = \frac{(25-25)^2}{25} +
\frac{(30-25)^2}{25} + \frac{(20-25)^2}{25} + \frac{(25-25)^2}{25}χ2=25(25−25)2
+25(30−25)2+25(20−25)2+25(25−25)2 =0+1+1+0=2= 0 + 1 + 1 + 0 = 2=0+1+1+0=2
Example
Male 40 20 60
Female 30 30 60
10 IN
Total 70 50 120
45 M
6
Expected frequency (Male-Science) = (60 × 70) / 120 = 35.
60 A
After calculations, χ² = 4.29, df = 1, critical χ² = 3.84 → Reject H₀.
32 ER
✅ Conclusion: Gender and subject choice are related.
03 IS
✅ Educational Example:
Testing if the proportion of students passing an exam is the same in four schools.
Summary of Unit 11
Self-Assessment Questions
10 IN
5. Why is Chi-Square test called a non-parametric test?
45 M
6
60 A
Unit 12: Statistical Inference for Ranked Data
32 ER
Learning Objectives
03 IS
12.1 Introduction
Many times, data collected in education are not precise measurements, but ranks or
ordered categories.
Examples:
For such data, we use non-parametric tests, which do not assume normal distribution
and are suitable for ordinal (ranked) data.
Definition
Normal distribution.
Equal variances.
Interval or ratio scales.
10 IN
12.3 Mann-Whitney U Test (Two Independent Samples)
45 M
6
Purpose
60 A
Tests whether two independent groups differ in their ranks.
32 ER
Steps
Where:
Example
A researcher compares achievement ranks of students taught with Method A (n=5) and
Method B (n=5).
After ranking, Mann-Whitney U is calculated → U = 4, critical value = 2.
Data Analysis B.Ed (Hons) QAED K O T A D D U | 44
Purpose
Steps
10 IN
Example
45 M
10 students are given a pre-test and post-test after a new teaching method.
6
Wilcoxon test shows W = 8, critical value = 6.
60 A
✅ Since 8 > 6, fail to reject H₀ → No significant improvement.
32 ER
12.5 Educational Applications
03 IS
Advantages
Fewer assumptions.
Suitable for ordinal data.
Useful with small samples.
Data Analysis B.Ed (Hons) QAED K O T A D D U | 45
Limitations
Summary of Unit 12
10 IN
Self-Assessment Questions
45 M
6
1. Why are non-parametric tests suitable for ranked data?
2. Differentiate between Mann-Whitney U and Wilcoxon tests.
60 A
3. A teacher wants to test whether students’ rankings in Math differ by gender.
Which test should be used?
32 ER
4. In a before-and-after training program, which test is most appropriate?
5. List two advantages and two limitations of non-parametric tests.
03 IS
A
Q