📊 STATISTICAL ANALYSIS OF DATA
✅ Definition:
Statistical analysis of data refers to the process of collecting, organizing, summarizing,
interpreting, and presenting numerical data to discover patterns, relationships, or trends, and
to make informed decisions.
In simple terms:
It means examining numbers using statistical methods to extract meaningful insights and
conclusions.
🎯 Objectives of Statistical Analysis:
1. To summarize large data sets in a meaningful way
2. To identify relationships and trends
3. To support decision-making with evidence
4. To test hypotheses and draw conclusions
5. To determine reliability and validity of data
🧮 Types of Statistical Analysis:
1. Descriptive Statistics
Used to describe and summarize the main features of a data set.
📘 Includes:
Mean, Median, Mode (measures of central tendency)
Standard Deviation, Variance, Range (measures of dispersion)
Frequency distributions, Percentages, Graphs (e.g., bar charts, pie charts)
2. Inferential Statistics
Used to make predictions or inferences about a population based on a sample.
📘 Includes:
Hypothesis testing (t-test, z-test, chi-square test)
Confidence intervals
Correlation and regression analysis
Analysis of variance (ANOVA)
📊 Common Statistical Techniques:
Technique Purpose
Mean, Median, Mode Identify the center or average of data
Standard Deviation Show how spread out the data is
Correlation Show relationship between two variables
Regression Predict value of one variable based on another
Chi-square Test Test association between categorical variables
ANOVA Compare means of three or more groups
t-Test / z-Test Compare means of two groups
Steps in Statistical Analysis:
1. Define the objective of analysis (What do you want to know?)
2. Collect data using valid tools (questionnaires, experiments, records)
3. Organize the data (tables, charts)
4. Analyze using statistical methods (use software like SPSS, Excel, R, etc.)
5. Interpret the results in relation to your research question
6. Present findings in a clear, understandable format (tables, graphs, summaries)
📌 Importance of Statistical Analysis:
Enhances accuracy and objectivity in research
Supports evidence-based conclusions
Helps identify patterns and predictions
Facilitates effective communication of results
🧾 Applications:
Education research (test scores, achievement levels)
Medical studies (drug effectiveness, patient recovery)
Social sciences (survey analysis, behavioral trends)
Business and economics (market research, sales forecasting)
📊 PARAMETRIC AND NON-PARAMETRIC TESTS
Statistical tests are used to analyze data and draw conclusions. They are broadly classified
into parametric and non-parametric tests based on the nature of the data and the
assumptions made about the population.
✅ 1. Parametric Tests
📘 Definition:
Parametric tests are statistical tests that assume the data follows a known distribution
(usually normal distribution) and have fixed parameters like mean and standard deviation.
🔑 Key Features:
Based on interval or ratio scale data (quantitative).
Assumes normal distribution of data.
Requires homogeneity of variance.
More powerful and precise if assumptions are met.
📊 Common Parametric Tests:
Test Purpose
t-test Compare means between two groups
z-test Compare means or proportions with known variance
ANOVA Compare means among three or more groups
Pearson correlation Measure linear relationship between variables
Regression analysis Predict dependent variable based on one or more independent variables
✅ 2. Non-Parametric Tests
📘 Definition:
Non-parametric tests are statistical tests that do not assume a specific distribution. They are
used when data violates assumptions required for parametric tests.
🔑 Key Features:
Used for ordinal or nominal data.
Makes no assumptions about population distribution.
Suitable for small sample sizes or skewed data.
Less powerful but more flexible than parametric tests.
📊 Common Non-Parametric Tests:
Test Purpose
Chi-square test Test association between categorical variables
Mann–Whitney U test Compare two independent groups (non-normal)
Wilcoxon signed-rank test Compare two related samples
Kruskal–Wallis test Compare more than two independent groups
Spearman's rank correlation Measure monotonic relationship between variables
📌 Comparison Table:
Criteria Parametric Test Non-Parametric Test
Ordinal/nominal (qualitative or
Data type Interval/ratio (quantitative)
ranked)
Distribution
Yes (usually normal) No
assumption
Sample size Usually large Can be small
Chi-square, Mann–Whitney U,
Test examples t-test, z-test, ANOVA
Wilcoxon
More powerful if assumptions Less powerful, but safer for non-
Power and precision
are met normal data
✅ Conclusion:
Use parametric tests when data meets statistical assumptions (normality,
homogeneity, etc.).
Use non-parametric tests when data is non-normal, ordinal, or when sample size is
small.
✅ PARAMETRIC TEST EXAMPLE: Independent Samples t-test
📘 Scenario:
You want to compare the average test scores of students from two different classes (Class A
and Class B) to see if there's a significant difference.
🎯 Hypotheses:
H₀ (Null): There is no significant difference in the means of Class A and Class B.
H₁ (Alternative): There is a significant difference in the means.
📊 Data:
Class A Scores Class B Scores
85 78
88 82
90 80
92 85
87 83
✅ Step 1: Calculate the Means
XˉA=85+88+90+92+875=88.4\bar{X}_A = \frac{85 + 88 + 90 + 92 + 87}{5} = 88.4XˉA
=585+88+90+92+87=88.4 XˉB=78+82+80+85+835=81.6\bar{X}_B = \frac{78 + 82 + 80 +
85 + 83}{5} = 81.6XˉB=578+82+80+85+83=81.6
✅ Step 2: Calculate the Standard Deviations (SD)
Using the formula:
SD=∑(xi−xˉ)2n−1SD = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n-1}}SD=n−1∑(xi−xˉ)2
For Class A: SD = 2.88
For Class B: SD = 2.88
(You can also use Excel or calculator to simplify this.)
✅ Step 3: Apply the t-test formula
t=XˉA−XˉBSDA2nA+SDB2nB=88.4−81.62.8825+2.8825=6.83.31≈6.81.82≈3.73t = \frac{\
bar{X}_A - \bar{X}_B}{\sqrt{\frac{SD_A^2}{n_A} + \frac{SD_B^2}{n_B}}} = \frac{88.4
- 81.6}{\sqrt{\frac{2.88^2}{5} + \frac{2.88^2}{5}}} = \frac{6.8}{\sqrt{3.31}} ≈ \frac{6.8}
{1.82} ≈ 3.73t=nASDA2+nBSDB2XˉA−XˉB=52.882+52.88288.4−81.6=3.316.8≈1.826.8
≈3.73
✅ Step 4: Compare with Critical Value
Degrees of freedom ≈ 8
t-critical (two-tailed, α = 0.05, df = 8) ≈ 2.306
Since 3.73 > 2.306, we reject the null hypothesis
✅ Conclusion: There is a significant difference in the average scores.
✅ NON-PARAMETRIC TEST EXAMPLE: Mann–Whitney U Test
📘 Scenario:
You want to test if there's a difference in reaction times (in seconds) between two groups
who had different training programs.
📊 Data:
Group A Group B
12 8
15 10
14 11
✅ Step 1: Combine and Rank the Data
Combined data:
8, 10, 11, 12, 14, 15
Ranks:
8 (1), 10 (2), 11 (3), 12 (4), 14 (5), 15 (6)
Assign ranks:
Group A (12, 15, 14) → Ranks: 4, 6, 5 → Total Rank A = 15
Group B (8, 10, 11) → Ranks: 1, 2, 3 → Total Rank B = 6
✅ Step 2: Use Mann–Whitney U Formula
UA=n1n2+n1(n1+1)2−RA=3×3+3(3+1)2−15=9+6−15=0U_A = n_1 n_2 + \frac{n_1(n_1 +
1)}{2} - R_A = 3×3 + \frac{3(3+1)}{2} - 15 = 9 + 6 - 15 = 0UA=n1n2+2n1(n1+1)−RA
=3×3+23(3+1)−15=9+6−15=0 UB=n1n2−UA=9−0=9U_B = n_1 n_2 - U_A = 9 - 0 = 9UB
=n1n2−UA=9−0=9
✅ Step 3: Use the Smaller U Value
U=0
Critical U value for n1 = n2 = 3 at α = 0.05 is 2
Since U = 0 < 2, we reject the null hypothesis
✅ Conclusion: There is a significant difference in reaction times between the two training
programs.
📊 Difference Between Parametric and Non-Parametric Tests
Parametric and non-parametric tests are two major categories of statistical tests used for
analyzing data. The key difference lies in the assumptions made about the underlying data
distribution.
🔁 Comparison Table:
Criteria Parametric Test Non-Parametric Test
Quantitative (Interval or Ratio Qualitative or Quantitative (Ordinal or
Data Type
scale) Nominal scale)
Assumption of Assumes data follows a normal
No assumption about data distribution
Distribution distribution
Based on known parameters
Does not involve parameters of the
Use of Parameters like mean and standard
population
deviation
Sample Size Requires larger samples to be
Suitable for small samples
Requirement reliable
Measurement Requires interval or ratio Accepts ordinal, nominal, or ranked
Scale scale data data
Generally more powerful if Less powerful, but more robust when
Statistical Power
assumptions are met assumptions are not met
t-test, z-test, ANOVA, Chi-square test, Mann–Whitney U test,
Examples Pearson’s correlation, Linear Wilcoxon test, Kruskal–Wallis,
regression Spearman's rho
Accuracy of More accurate with normally Less precise, but more flexible with
Results distributed data non-normal or unknown distributions
When the population is known When dealing with unknown or non-
Application
and well-defined normal populations
✅ In Summary:
Use parametric tests when:
✅ Data is numerical (interval/ratio)
✅ Sample size is large
✅ Data is normally distributed
Use non-parametric tests when:
✅ Data is categorical or ordinal
✅ Distribution is unknown or non-normal
✅ Sample size is small
🧠 Tip for Students & Researchers:
"If your data doesn't meet the assumptions of parametric tests (especially normality), go with
a non-parametric alternative!"
✅ Applications of Parametric and Non-Parametric Tests
Parametric and non-parametric tests are essential tools in research for analyzing data and
making informed decisions. Their applications vary depending on the type of data, research
objectives, and assumptions about the data distribution.
📊 Applications of Parametric Tests
Parametric tests are widely used when data meets the assumptions of normal distribution,
equal variances, and is measured on an interval or ratio scale.
🔹 1. Education Research
Comparing average test scores between two or more student groups (using t-test or
ANOVA)
Evaluating effectiveness of teaching methods
🔹 2. Medical and Clinical Trials
Analyzing effects of a drug on patients (pre-test vs post-test scores using paired t-test)
Comparing blood pressure or glucose levels between control and treatment groups
🔹 3. Business and Economics
Comparing mean sales across regions or periods
Studying impact of a new marketing strategy using regression analysis
🔹 4. Psychology
Measuring differences in IQ scores between genders or age groups
Studying memory test scores before and after therapy
🔹 5. Engineering & Agriculture
Testing performance of machines or fertilizers using ANOVA
Evaluating crop yield under different treatments
📊 Applications of Non-Parametric Tests
Non-parametric tests are used when data is ordinal, nominal, or does not meet parametric
assumptions. They are especially useful for small sample sizes or ranked data.
🔹 1. Social Sciences
Testing association between gender and voting preference (Chi-square test)
Studying public opinion or satisfaction levels (ordinal data)
🔹 2. Psychology and Behavior Studies
Comparing anxiety levels (ranked scores) between treated and untreated groups
(Mann–Whitney U)
Analyzing before-after scores on non-normally distributed psychological scales
(Wilcoxon test)
🔹 3. Health and Medical Fields
Evaluating pain relief scores using ranked data
Analyzing survival data or treatment outcomes with unknown distributions
🔹 4. Market Research
Analyzing customer satisfaction rankings (Spearman's rank correlation)
Studying preference patterns across different product designs
🔹 5. Education
Testing changes in student rankings after an intervention
Comparing ranked performance across groups with non-normal distributions
📌 Summary Table:
Test Type Data Type Common Fields of Application
Parametric Tests Interval/Ratio (Normal) Medicine, Education, Business, Agriculture
Non-Parametric Ordinal/Nominal (Non- Psychology, Sociology, Health Sciences,
Tests normal) Market Research
✅ Final Note:
Parametric tests: High power, best when assumptions are met
Non-parametric tests: More flexible, ideal for real-world data that is messy or
limited
📊 DESCRIPTIVE STATISTICAL ANALYSIS
✅ Definition:
Descriptive statistical analysis refers to the process of summarizing, organizing, and
presenting data in a meaningful way. It helps to describe the main features of a dataset and
provides simple summaries about the sample and the measures.
🔹 In simple terms:
It tells you what the data shows, without drawing conclusions beyond the data itself.
🎯 Objectives of Descriptive Statistics:
1. To summarize large volumes of data
2. To describe central values (e.g., average, typical)
3. To show variability or spread in data
4. To make data easier to interpret through charts and tables
📘 Types of Descriptive Statistics
🔹 1. Measures of Central Tendency
They describe the center or average of a dataset.
Measure Description Formula Example
Mean Arithmetic average xˉ=∑xn\bar{x} = \frac{\sum x}{n}xˉ=n∑x
Median Middle value in ordered data Middle number (odd n)
Mode Most frequently occurring value Most common number in dataset
🔹 2. Measures of Dispersion (Variability)
They describe the spread or spread-out-ness of data.
Measure Description
Range Difference between highest and lowest values
Variance Average of squared deviations from the mean
Standard Deviation (SD) Square root of variance (shows spread around the mean)
Interquartile Range (IQR) Range between Q1 (25%) and Q3 (75%) values
🔹 3. Measures of Position
These locate a specific data point in relation to the entire dataset.
Measure Description
Percentiles Values below which a given % of observations fall
Quartiles Divide data into four equal parts
Deciles Divide data into ten equal parts
🔹 4. Graphical Representation
Used to visualize data for easier understanding.
Tool Used For
Bar chart Categorical data
Histogram Continuous/numerical data
Pie chart Percentage/proportional comparison
Line graph Trend over time
Box plot Shows median, IQR, and outliers
Example:
Data Set: 10, 12, 15, 18, 20
Mean: 10+12+15+18+205=15\frac{10+12+15+18+20}{5} = 15510+12+15+18+20
=15
Median: 15 (middle value)
Mode: None (no repeats)
Range: 20 − 10 = 10
Standard Deviation: √[(25 + 9 + 0 + 9 + 25)/5] = √13.6 ≈ 3.69
📌 Applications of Descriptive Statistical Analysis:
Education: Summarizing test scores of students
Health: Describing patient recovery times or symptom frequency
Business: Analyzing monthly sales data
Social Research: Summarizing survey results
Finance: Reporting average investment returns
✅ Conclusion:
Descriptive statistics help to make raw data understandable by reducing it to a manageable
form. They form the foundation for further analysis such as inferential statistics.
📊 GRAPHICAL AND DIAGRAMMATIC REPRESENTATION OF DATA
✅ Definition:
Graphical and diagrammatic representation refers to the use of visual tools like charts,
graphs, and diagrams to present statistical data in a way that is easily understood,
interpreted, and compared.
📌 In simple words: It is the visual presentation of data to make trends, patterns, and
comparisons clear at a glance.
🎯 Objectives:
To present complex data in a simplified, visual form
To facilitate quick understanding and comparison
To highlight trends, relationships, and variations in data
To support decision-making and reporting
🔷 Types of Graphical and Diagrammatic Representations:
🔹 1. Line Graph
Used to show trends over time (continuous data).
Points are plotted and connected by a line.
✅ Example: Temperature changes over a week.
🔹 2. Bar Graph / Bar Diagram
Shows comparisons between categories using rectangular bars.
Bars can be vertical or horizontal.
✅ Example: Number of students in different classes.
🔹 3. Pie Chart (Circular Diagram)
A circle divided into sectors representing proportions or percentages.
Best for showing part-to-whole relationships.
✅ Example: Market share of different brands.
🔹 4. Histogram
A type of bar chart for continuous data grouped into intervals (classes).
Bars touch each other to indicate continuity.
✅ Example: Distribution of student marks.
🔹 5. Frequency Polygon
Line graph made by joining the midpoints of the top of histogram bars.
Useful for comparing frequency distributions.
🔹 6. Ogive (Cumulative Frequency Curve)
Represents cumulative frequency distribution.
Helps to determine median, quartiles, percentiles.
🔹 7. Pictogram
Uses pictures or icons to represent data.
Often used in elementary education or public displays.
🔹 8. Dot Plot / Scatter Plot
Dot plot: Shows frequency by stacking dots above a number line.
Scatter plot: Shows relationship between two variables (correlation).
✅ Example: Relationship between study time and test scores.
📌 Comparison: Graphs vs Diagrams
Feature Graphs Diagrams
Data type Quantitative (numerical) Quantitative or Qualitative
Shows trends? Yes Not always
Examples Line graph, histogram Pie chart, pictogram, bar diagram
Best used for Time series, comparisons Proportions, categories
📝 Best Practices:
Choose the right type of graph/diagram for the data.
Use proper labels, scales, and titles.
Avoid clutter—keep it clean and readable.
Use colors and legends wisely for clarity.
✅ Conclusion:
Graphical and diagrammatic representations are powerful tools in statistics and research.
They enhance communication, support data interpretation, and help in making informed
decisions.
📊 MEASURES OF CENTRAL TENDENCY
✅ Definition:
Measures of central tendency are statistical tools used to identify the central or average
value of a dataset. They give a single value that represents the entire distribution, helping to
understand where the data is centered.
🔹 In simple words: It shows the typical or average value in a dataset.
🔺 Types of Measures of Central Tendency:
1. Mean (Arithmetic Average)
📘 Definition:
The sum of all values divided by the number of values.
🧮 Formula:
Mean=∑xn\text{Mean} = \frac{\sum x}{n}Mean=n∑x
✅ Example:
Marks: 10, 15, 20
Mean=10+15+203=453=15\text{Mean} = \frac{10 + 15 + 20}{3} = \frac{45}{3} =
15Mean=310+15+20=345=15
2. Median
📘 Definition:
The middle value in a sorted (ordered) dataset.
If n is odd: Median = Middle value
If n is even: Median = Average of two middle values
✅ Example:
Data: 5, 8, 11, 13, 16
Median = 11 (middle value)
Data: 6, 8, 10, 12
Median = (8 + 10)/2 = 9
3. Mode
📘 Definition:
The value that occurs most frequently in a dataset.
✅ Example:
Data: 4, 7, 7, 9, 10
Mode = 7 (it appears twice)
A dataset can have no mode, one mode (unimodal), two modes (bimodal), or more
(multimodal).
🧾 Comparison Table:
Measure Usefulness Best For
Mean Considers all values; affected by outliers Normally distributed data
Median Not affected by extreme values Skewed data or open-ended distributions
Mode Identifies most common value Categorical or qualitative data
📌 Applications:
Field Use of Central Tendency Measures
Education Average marks of students
Health Median recovery time
Business Average sales or income
Sociology Most common family size
Agriculture Average yield of crops per acre
🧠 Tips:
Mean is used when the data is symmetrical.
Median is better when there are outliers or skewed data.
Mode is ideal for non-numerical data (e.g., favorite fruit, most used transport).
Here are practice problems on Measures of Central Tendency (Mean, Median, and Mode),
including both ungrouped and grouped data problems.
🔹 A. Ungrouped Data Problems
🧮 1. Mean – Basic Level
Question:
Find the mean of the following numbers:
8, 12, 15, 10, 5
Solution:
Mean=8+12+15+10+55=505=10\text{Mean} = \frac{8 + 12 + 15 + 10 + 5}{5} = \frac{50}
{5} = 10Mean=58+12+15+10+5=550=10
🧮 2. Median – Odd Number of Values
Question:
Find the median of:
3, 9, 11, 15, 17
Solution:
Ordered Data = 3, 9, 11, 15, 17
Median = 11 (middle value)
🧮 3. Median – Even Number of Values
Question:
Find the median of:
6, 10, 14, 18
Solution:
Ordered Data = 6, 10, 14, 18
Median = (10 + 14) / 2 = 12
🧮 4. Mode
Question:
Find the mode of:
2, 5, 7, 5, 9, 5, 8
Solution:
Mode = 5 (occurs 3 times)
🔷 B. Grouped Data Problems
🧮 5. Mean – Grouped Data
Class Interval Frequency
0 – 10 4
10 – 20 6
20 – 30 10
30 – 40 5
Steps:
1. Find midpoints (x) of each class:
5, 15, 25, 35
2. Multiply midpoints by frequency (f × x)
3. Apply the formula:
Mean=∑f⋅x∑f\text{Mean} = \frac{\sum f \cdot x}{\sum f}Mean=∑f∑f⋅x
∑f⋅x=(4×5)+(6×15)+(10×25)+(5×35)=20+90+250+175=535∑f=4+6+10+5=25Mean=53525=
21.4\sum f \cdot x = (4×5) + (6×15) + (10×25) + (5×35) = 20 + 90 + 250 + 175 = 535 \sum f
= 4 + 6 + 10 + 5 = 25 \text{Mean} = \frac{535}{25} =
21.4∑f⋅x=(4×5)+(6×15)+(10×25)+(5×35)=20+90+250+175=535∑f=4+6+10+5=25Mean=25
535=21.4
🧮 6. Median – Grouped Data
Class Interval Frequency
0 – 10 5
10 – 20 8
20 – 30 12
30 – 40 10
Steps:
N = 35, N/2 = 17.5
Cumulative Frequencies:
o 0–10: 5
o 10–20: 13
o 20–30: 25
Median class = 20–30 (because 17.5 lies here)
Use Formula:
Median=L+(N2−Ff)×h\text{Median} = L + \left(\frac{\frac{N}{2} - F}{f}\right) \times
hMedian=L+(f2N−F)×h
Where:
L = lower boundary = 20
N = 35
F = cumulative frequency before median class = 13
f = frequency of median class = 12
h = class width = 10
Median=20+(17.5−1312)×10=20+(4.512)×10=20+3.75=23.75\text{Median} = 20 + \left(\
frac{17.5 - 13}{12}\right) \times 10 = 20 + \left(\frac{4.5}{12}\right) \times 10 = 20 + 3.75
= 23.75Median=20+(1217.5−13)×10=20+(124.5)×10=20+3.75=23.75
🧮 7. Mode – Grouped Data
Class Interval Frequency
10 – 20 5
20 – 30 10
30 – 40 12
40 – 50 8
Steps:
Modal class = 30–40 (highest frequency = 12)
Use Formula:
Mode=L+(f1−f02f1−f0−f2)×h\text{Mode} = L + \left( \frac{f_1 - f_0}{2f_1 - f_0 - f_2} \
right) \times hMode=L+(2f1−f0−f2f1−f0)×h
Where:
L = 30,
f₁ = 12 (modal class freq)
f₀ = 10 (preceding class)
f₂ = 8 (succeeding class)
h = 10
Mode=30+(12−102×12−10−8)×10=30+(224−18)×10=30+(26)×10=30+3.33=33.33\
text{Mode} = 30 + \left( \frac{12 - 10}{2×12 - 10 - 8} \right) × 10 = 30 + \left( \frac{2}{24 -
18} \right) × 10 = 30 + \left( \frac{2}{6} \right) × 10 = 30 + 3.33 =
33.33Mode=30+(2×12−10−812−10)×10=30+(24−182)×10=30+(62)×10=30+3.33=33.33
📘 Practice Exercise (Solve Yourself):
🔹 A. Ungrouped
1. Find the mean, median, and mode of:
14, 18, 20, 15, 18, 19, 22
2. Find the median of:
34, 25, 18, 45, 32, 27 (Hint: Arrange in order first)
🔹 B. Grouped
Class Interval Frequency
0–10 3
10–20 5
20–30 8
30–40 4
3. Find the mean of the data.
4. Find the median class and calculate the median.
Let me know if you'd like the answers, or want this as a printable worksheet!
Ask ChatGPT
📊 MEASURES OF DISPERSION
✅ Definition:
Measures of dispersion are statistical tools used to describe the spread, variability, or
consistency of a dataset. They indicate how much the values in a dataset differ from the
average (central tendency).
🔍 In simple terms: They tell us how scattered or close together the data points are.
🎯 Objectives of Measures of Dispersion:
To understand the degree of variation in data
To compare uniformity or consistency across different datasets
To complement measures of central tendency
To assess risk or uncertainty in business, finance, science, etc.
🔢 Types of Measures of Dispersion
🔹 1. Range
Definition:
Difference between the highest and lowest value.
Range=Maximum value−Minimum value\text{Range} = \text{Maximum value} - \
text{Minimum value}Range=Maximum value−Minimum value
✅ Example:
Data: 10, 15, 20, 25
Range = 25 − 10 = 15
🔹 2. Interquartile Range (IQR)
Definition:
Spread of the middle 50% of the data.
IQR=Q3−Q1\text{IQR} = Q_3 - Q_1IQR=Q3−Q1
Where:
Q1Q_1Q1: First Quartile (25%)
Q3Q_3Q3: Third Quartile (75%)
✅ Use: Reduces the effect of extreme values/outliers
🔹 3. Mean Deviation (MD)
Definition:
Average of the absolute deviations from the mean or median.
MD=∑∣x−xˉ∣n\text{MD} = \frac{\sum |x - \bar{x}|}{n}MD=n∑∣x−xˉ∣
✅ Shows: How much, on average, each data point differs from the mean.
🔹 4. Variance (σ²)
Definition:
Average of the squared differences from the mean.
σ2=∑(x−xˉ)2n\sigma^2 = \frac{\sum (x - \bar{x})^2}{n}σ2=n∑(x−xˉ)2
✅ Used for: Advanced statistical analysis and risk assessment.
🔹 5. Standard Deviation (SD or σ)
Definition:
Square root of variance. Shows how spread out the values are from the mean.
σ=∑(x−xˉ)2n\sigma = \sqrt{\frac{\sum (x - \bar{x})^2}{n}}σ=n∑(x−xˉ)2
✅ Most widely used measure of dispersion.
📘 Comparison Table:
Sensitive to Simple to
Measure Best For
Outliers Calculate
Range Quick estimate of spread Yes ✅
IQR Spread of middle 50% No ✅
Mean Deviation Consistent deviations Less Moderate
Variance Statistical formulas and Yes ❌
Sensitive to Simple to
Measure Best For
Outliers Calculate
models
Standard
Risk, reliability, consistency Yes ❌
Deviation
🧾 Applications:
Field Application Example
Education Variation in students' scores
Finance Risk in investment returns (standard deviation)
Manufacturing Product quality variation
Agriculture Yield fluctuation over seasons
Healthcare Patient response variability to treatments
🧠 Example (Standard Deviation):
Data: 5, 7, 8, 10, 10
Mean = 5+7+8+10+105=8\frac{5 + 7 + 8 + 10 + 10}{5} = 855+7+8+10+10=8
Deviations: -3, -1, 0, 2, 2
Squared deviations: 9, 1, 0, 4, 4
Variance: 9+1+0+4+45=3.6\frac{9+1+0+4+4}{5} = 3.659+1+0+4+4=3.6
Standard Deviation: 3.6≈1.9\sqrt{3.6} \approx 1.93.6≈1.9
Here are some simple, step-by-step sums on Standard Deviation using small datasets.
These are perfect for learning and practicing the concept.
✅ Formula for Standard Deviation (Ungrouped Data):
σ=∑(x−xˉ)2n\sigma = \sqrt{ \frac{\sum (x - \bar{x})^2}{n} }σ=n∑(x−xˉ)2
Where:
xxx = each value
xˉ\bar{x}xˉ = mean of the data
nnn = number of observations
🔹 Sum 1:
Data: 5, 7, 9
Step 1: Find the Mean
xˉ=5+7+93=213=7\bar{x} = \frac{5 + 7 + 9}{3} = \frac{21}{3} = 7xˉ=35+7+9=321=7
Step 2: Find deviations and square them
(5 − 7)² = 4
(7 − 7)² = 0
(9 − 7)² = 4
Step 3: Find Variance and Standard Deviation
σ2=4+0+43=83≈2.67\sigma^2 = \frac{4 + 0 + 4}{3} = \frac{8}{3} \approx 2.67 σ2=34+0+4
=38≈2.67 σ=2.67≈1.63\sigma = \sqrt{2.67} \approx 1.63σ=2.67≈1.63
🔹 Sum 2:
Data: 10, 12, 14, 16
Step 1: Mean
xˉ=10+12+14+164=524=13\bar{x} = \frac{10 + 12 + 14 + 16}{4} = \frac{52}{4} =
13xˉ=410+12+14+16=452=13
Step 2: Squared deviations
(10 − 13)² = 9
(12 − 13)² = 1
(14 − 13)² = 1
(16 − 13)² = 9
Step 3: Variance and SD
σ2=9+1+1+94=204=5\sigma^2 = \frac{9 + 1 + 1 + 9}{4} = \frac{20}{4} = 5 σ2=49+1+1+9
=420=5 σ=5≈2.24\sigma = \sqrt{5} \approx 2.24σ=5≈2.24
🔹 Sum 3:
Data: 4, 4, 4, 4
Step 1: Mean
xˉ=4+4+4+44=4\bar{x} = \frac{4+4+4+4}{4} = 4xˉ=44+4+4+4=4
Step 2: All deviations are 0 → variance = 0
σ=0\sigma = 0σ=0
✅ Standard Deviation is zero when all values are the same.
🔹 Sum 4:
Data: 2, 4, 6, 8, 10
Step 1: Mean
xˉ=2+4+6+8+105=305=6\bar{x} = \frac{2+4+6+8+10}{5} = \frac{30}{5} =
6xˉ=52+4+6+8+10=530=6
Step 2: Squared deviations
(2−6)² = 16
(4−6)² = 4
(6−6)² = 0
(8−6)² = 4
(10−6)² = 16
σ2=16+4+0+4+165=405=8\sigma^2 = \frac{16 + 4 + 0 + 4 + 16}{5} = \frac{40}{5} = 8
σ2=516+4+0+4+16=540=8 σ=8≈2.83\sigma = \sqrt{8} \approx 2.83σ=8≈2.83
📝 Worksheet: Simple Sums on Standard Deviation
🔢 Q1.
Find the standard deviation for the following data:
5, 7, 9
🔢 Q2.
Calculate the standard deviation for the numbers:
10, 12, 14, 16
🔢 Q3.
Find the standard deviation of:
4, 4, 4, 4
🔢 Q4.
Compute the standard deviation for:
2, 4, 6, 8, 10
🔢 Q5.
A student got the following marks in tests:
55, 60, 65, 70, 75
Find the standard deviation.
🔢 Q6.
Determine the standard deviation of the dataset:
3, 6, 9
🔢 Q7.
Find the variance and standard deviation of:
7, 9, 11, 13, 15
🔢 Q8.
A worker produced the following number of items in 5 days:
18, 20, 22, 24, 26
Calculate the standard deviation.
🔢 Q9.
Find the standard deviation for the following small dataset:
1, 2, 3, 4, 5
🔢 Q10.
The ages of a group of children are:
5, 6, 7, 8
Find the standard deviation.
Here is the ✅ Answer Key for the Simple Standard Deviation Worksheet with step-by-
step solutions using the formula:
σ=∑(x−xˉ)2n\sigma = \sqrt{\frac{\sum (x - \bar{x})^2}{n}}σ=n∑(x−xˉ)2
🔢 Q1. Data: 5, 7, 9
Mean xˉ=7\bar{x} = 7xˉ=7
Deviations: -2, 0, 2
Squared Deviations: 4, 0, 4
Variance = 83≈2.67\frac{8}{3} \approx 2.6738≈2.67
SD ≈ 1.63
🔢 Q2. Data: 10, 12, 14, 16
Mean xˉ=13\bar{x} = 13xˉ=13
Deviations: -3, -1, 1, 3
Squared: 9, 1, 1, 9
Variance = 204=5\frac{20}{4} = 5420=5
SD = √5 ≈ 2.24
🔢 Q3. Data: 4, 4, 4, 4
Mean = 4
All deviations = 0
SD = 0
🔢 Q4. Data: 2, 4, 6, 8, 10
Mean = 6
Squared deviations: 16, 4, 0, 4, 16
Variance = 405=8\frac{40}{5} = 8540=8
SD = √8 ≈ 2.83
🔢 Q5. Data: 55, 60, 65, 70, 75
Mean = 65
Squared deviations: 100, 25, 0, 25, 100
Variance = 2505=50\frac{250}{5} = 505250=50
SD = √50 ≈ 7.07
🔢 Q6. Data: 3, 6, 9
Mean = 6
Squared deviations: 9, 0, 9
Variance = 183=6\frac{18}{3} = 6318=6
SD = √6 ≈ 2.45
🔢 Q7. Data: 7, 9, 11, 13, 15
Mean = 11
Squared deviations: 16, 4, 0, 4, 16
Variance = 405=8\frac{40}{5} = 8540=8
SD = √8 ≈ 2.83
🔢 Q8. Data: 18, 20, 22, 24, 26
Mean = 22
Squared deviations: 16, 4, 0, 4, 16
Variance = 40 ÷ 5 = 8
SD = √8 ≈ 2.83
🔢 Q9. Data: 1, 2, 3, 4, 5
Mean = 3
Squared deviations: 4, 1, 0, 1, 4
Variance = 10 ÷ 5 = 2
SD = √2 ≈ 1.41
🔢 Q10. Data: 5, 6, 7, 8
Mean = 6.5
Squared deviations: 2.25, 0.25, 0.25, 2.25
Variance = 5 ÷ 4 = 1.25
SD = √1.25 ≈ 1.12
📘 CORRELATION – DEFINITION AND TYPES
✅ Definition of Correlation:
Correlation is a statistical tool that measures the degree and direction of a relationship
between two or more variables.
In simple terms: Correlation tells us how strongly and in what direction two variables are
related.
🔁 Example:
If height increases, and weight also increases, there is positive correlation.
If price increases and demand decreases, there is negative correlation.
📊 Types of Correlation
Correlation can be classified on the basis of:
I. Direction of Relationship
🔹 1. Positive Correlation
Both variables increase or decrease together.
Example: Income ↑ → Expenditure ↑
🔹 2. Negative Correlation
One variable increases, the other decreases.
Example: Price ↑ → Demand ↓
🔹 3. Zero (No) Correlation
No consistent pattern between variables.
Example: Shoe size and intelligence.
II. Number of Variables
🔹 1. Simple Correlation
Relationship between two variables only.
Example: Temperature and electricity usage.
🔹 2. Multiple Correlation
Relationship between one variable and two or more others.
Example: Crop yield and rainfall, fertilizer use, and temperature.
🔹 3. Partial Correlation
Relationship between two variables while controlling other variables.
Example: Income and expenditure, controlling for family size.
III. Form of Relationship
🔹 1. Linear Correlation
Change in one variable is proportional to the change in the other.
Represented by a straight line in graph.
Example: Salary and years of experience.
🔹 2. Non-linear (Curvilinear) Correlation
Change in one variable is not constant with respect to the other.
Example: Stress and productivity.
IV. Method of Measurement
🔹 1. Pearson’s Correlation Coefficient (r)
Measures strength and direction of linear relationship.
Value ranges from –1 to +1.
🔹 2. Spearman’s Rank Correlation
Based on ranking of data.
Used when data is ordinal or not normally distributed.
📈 Summary Table
Type Description Example
Positive Correlation ↑X → ↑Y or ↓X → ↓Y Height and Weight
Negative
↑X → ↓Y Price and Demand
Correlation
Type Description Example
Zero Correlation No relation Shoe size and Intelligence
Simple Correlation Between two variables Study time and exam marks
Multiple Between one dependent and many
Crop yield and inputs
Correlation independent vars
Income vs. Spending
Partial Correlation Between two vars, controlling others
(controlling age)
Linear Correlation Constant rate of change Distance and time
Non-linear
Changing rate of relation Age and strength
Correlation