0% found this document useful (0 votes)
47 views31 pages

? Statistical Analysis of Data

Uploaded by

Prema Kumari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views31 pages

? Statistical Analysis of Data

Uploaded by

Prema Kumari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

📊 STATISTICAL ANALYSIS OF DATA

✅ Definition:

Statistical analysis of data refers to the process of collecting, organizing, summarizing,


interpreting, and presenting numerical data to discover patterns, relationships, or trends, and
to make informed decisions.

In simple terms:
It means examining numbers using statistical methods to extract meaningful insights and
conclusions.

🎯 Objectives of Statistical Analysis:

1. To summarize large data sets in a meaningful way


2. To identify relationships and trends
3. To support decision-making with evidence
4. To test hypotheses and draw conclusions
5. To determine reliability and validity of data

🧮 Types of Statistical Analysis:

1. Descriptive Statistics

Used to describe and summarize the main features of a data set.

📘 Includes:

 Mean, Median, Mode (measures of central tendency)


 Standard Deviation, Variance, Range (measures of dispersion)
 Frequency distributions, Percentages, Graphs (e.g., bar charts, pie charts)

2. Inferential Statistics

Used to make predictions or inferences about a population based on a sample.

📘 Includes:

 Hypothesis testing (t-test, z-test, chi-square test)


 Confidence intervals
 Correlation and regression analysis
 Analysis of variance (ANOVA)

📊 Common Statistical Techniques:

Technique Purpose
Mean, Median, Mode Identify the center or average of data
Standard Deviation Show how spread out the data is
Correlation Show relationship between two variables
Regression Predict value of one variable based on another
Chi-square Test Test association between categorical variables
ANOVA Compare means of three or more groups
t-Test / z-Test Compare means of two groups

Steps in Statistical Analysis:

1. Define the objective of analysis (What do you want to know?)


2. Collect data using valid tools (questionnaires, experiments, records)
3. Organize the data (tables, charts)
4. Analyze using statistical methods (use software like SPSS, Excel, R, etc.)
5. Interpret the results in relation to your research question
6. Present findings in a clear, understandable format (tables, graphs, summaries)

📌 Importance of Statistical Analysis:

 Enhances accuracy and objectivity in research


 Supports evidence-based conclusions
 Helps identify patterns and predictions
 Facilitates effective communication of results

🧾 Applications:

 Education research (test scores, achievement levels)


 Medical studies (drug effectiveness, patient recovery)
 Social sciences (survey analysis, behavioral trends)
 Business and economics (market research, sales forecasting)

📊 PARAMETRIC AND NON-PARAMETRIC TESTS


Statistical tests are used to analyze data and draw conclusions. They are broadly classified
into parametric and non-parametric tests based on the nature of the data and the
assumptions made about the population.

✅ 1. Parametric Tests

📘 Definition:

Parametric tests are statistical tests that assume the data follows a known distribution
(usually normal distribution) and have fixed parameters like mean and standard deviation.

🔑 Key Features:

 Based on interval or ratio scale data (quantitative).


 Assumes normal distribution of data.
 Requires homogeneity of variance.
 More powerful and precise if assumptions are met.

📊 Common Parametric Tests:

Test Purpose
t-test Compare means between two groups
z-test Compare means or proportions with known variance
ANOVA Compare means among three or more groups
Pearson correlation Measure linear relationship between variables
Regression analysis Predict dependent variable based on one or more independent variables

✅ 2. Non-Parametric Tests

📘 Definition:

Non-parametric tests are statistical tests that do not assume a specific distribution. They are
used when data violates assumptions required for parametric tests.

🔑 Key Features:

 Used for ordinal or nominal data.


 Makes no assumptions about population distribution.
 Suitable for small sample sizes or skewed data.
 Less powerful but more flexible than parametric tests.
📊 Common Non-Parametric Tests:

Test Purpose
Chi-square test Test association between categorical variables
Mann–Whitney U test Compare two independent groups (non-normal)
Wilcoxon signed-rank test Compare two related samples
Kruskal–Wallis test Compare more than two independent groups
Spearman's rank correlation Measure monotonic relationship between variables

📌 Comparison Table:

Criteria Parametric Test Non-Parametric Test


Ordinal/nominal (qualitative or
Data type Interval/ratio (quantitative)
ranked)
Distribution
Yes (usually normal) No
assumption
Sample size Usually large Can be small
Chi-square, Mann–Whitney U,
Test examples t-test, z-test, ANOVA
Wilcoxon
More powerful if assumptions Less powerful, but safer for non-
Power and precision
are met normal data

✅ Conclusion:

 Use parametric tests when data meets statistical assumptions (normality,


homogeneity, etc.).
 Use non-parametric tests when data is non-normal, ordinal, or when sample size is
small.

✅ PARAMETRIC TEST EXAMPLE: Independent Samples t-test

📘 Scenario:

You want to compare the average test scores of students from two different classes (Class A
and Class B) to see if there's a significant difference.

🎯 Hypotheses:

 H₀ (Null): There is no significant difference in the means of Class A and Class B.


 H₁ (Alternative): There is a significant difference in the means.

📊 Data:
Class A Scores Class B Scores

85 78

88 82

90 80

92 85

87 83

✅ Step 1: Calculate the Means

XˉA=85+88+90+92+875=88.4\bar{X}_A = \frac{85 + 88 + 90 + 92 + 87}{5} = 88.4XˉA


=585+88+90+92+87=88.4 XˉB=78+82+80+85+835=81.6\bar{X}_B = \frac{78 + 82 + 80 +
85 + 83}{5} = 81.6XˉB=578+82+80+85+83=81.6

✅ Step 2: Calculate the Standard Deviations (SD)

Using the formula:

SD=∑(xi−xˉ)2n−1SD = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n-1}}SD=n−1∑(xi−xˉ)2

 For Class A: SD = 2.88


 For Class B: SD = 2.88
(You can also use Excel or calculator to simplify this.)

✅ Step 3: Apply the t-test formula

t=XˉA−XˉBSDA2nA+SDB2nB=88.4−81.62.8825+2.8825=6.83.31≈6.81.82≈3.73t = \frac{\
bar{X}_A - \bar{X}_B}{\sqrt{\frac{SD_A^2}{n_A} + \frac{SD_B^2}{n_B}}} = \frac{88.4
- 81.6}{\sqrt{\frac{2.88^2}{5} + \frac{2.88^2}{5}}} = \frac{6.8}{\sqrt{3.31}} ≈ \frac{6.8}
{1.82} ≈ 3.73t=nASDA2+nBSDB2XˉA−XˉB=52.882+52.88288.4−81.6=3.316.8≈1.826.8
≈3.73

✅ Step 4: Compare with Critical Value

 Degrees of freedom ≈ 8
 t-critical (two-tailed, α = 0.05, df = 8) ≈ 2.306
 Since 3.73 > 2.306, we reject the null hypothesis
✅ Conclusion: There is a significant difference in the average scores.

✅ NON-PARAMETRIC TEST EXAMPLE: Mann–Whitney U Test

📘 Scenario:

You want to test if there's a difference in reaction times (in seconds) between two groups
who had different training programs.

📊 Data:

Group A Group B

12 8

15 10

14 11

✅ Step 1: Combine and Rank the Data

Combined data:
8, 10, 11, 12, 14, 15
Ranks:
8 (1), 10 (2), 11 (3), 12 (4), 14 (5), 15 (6)

Assign ranks:

 Group A (12, 15, 14) → Ranks: 4, 6, 5 → Total Rank A = 15


 Group B (8, 10, 11) → Ranks: 1, 2, 3 → Total Rank B = 6

✅ Step 2: Use Mann–Whitney U Formula

UA=n1n2+n1(n1+1)2−RA=3×3+3(3+1)2−15=9+6−15=0U_A = n_1 n_2 + \frac{n_1(n_1 +


1)}{2} - R_A = 3×3 + \frac{3(3+1)}{2} - 15 = 9 + 6 - 15 = 0UA=n1n2+2n1(n1+1)−RA
=3×3+23(3+1)−15=9+6−15=0 UB=n1n2−UA=9−0=9U_B = n_1 n_2 - U_A = 9 - 0 = 9UB
=n1n2−UA=9−0=9

✅ Step 3: Use the Smaller U Value

 U=0
 Critical U value for n1 = n2 = 3 at α = 0.05 is 2

Since U = 0 < 2, we reject the null hypothesis

✅ Conclusion: There is a significant difference in reaction times between the two training
programs.

📊 Difference Between Parametric and Non-Parametric Tests

Parametric and non-parametric tests are two major categories of statistical tests used for
analyzing data. The key difference lies in the assumptions made about the underlying data
distribution.

🔁 Comparison Table:

Criteria Parametric Test Non-Parametric Test


Quantitative (Interval or Ratio Qualitative or Quantitative (Ordinal or
Data Type
scale) Nominal scale)
Assumption of Assumes data follows a normal
No assumption about data distribution
Distribution distribution
Based on known parameters
Does not involve parameters of the
Use of Parameters like mean and standard
population
deviation
Sample Size Requires larger samples to be
Suitable for small samples
Requirement reliable
Measurement Requires interval or ratio Accepts ordinal, nominal, or ranked
Scale scale data data
Generally more powerful if Less powerful, but more robust when
Statistical Power
assumptions are met assumptions are not met
t-test, z-test, ANOVA, Chi-square test, Mann–Whitney U test,
Examples Pearson’s correlation, Linear Wilcoxon test, Kruskal–Wallis,
regression Spearman's rho
Accuracy of More accurate with normally Less precise, but more flexible with
Results distributed data non-normal or unknown distributions
When the population is known When dealing with unknown or non-
Application
and well-defined normal populations

✅ In Summary:

 Use parametric tests when:


✅ Data is numerical (interval/ratio)
✅ Sample size is large
✅ Data is normally distributed
 Use non-parametric tests when:
✅ Data is categorical or ordinal
✅ Distribution is unknown or non-normal
✅ Sample size is small

🧠 Tip for Students & Researchers:

"If your data doesn't meet the assumptions of parametric tests (especially normality), go with
a non-parametric alternative!"

✅ Applications of Parametric and Non-Parametric Tests

Parametric and non-parametric tests are essential tools in research for analyzing data and
making informed decisions. Their applications vary depending on the type of data, research
objectives, and assumptions about the data distribution.

📊 Applications of Parametric Tests

Parametric tests are widely used when data meets the assumptions of normal distribution,
equal variances, and is measured on an interval or ratio scale.

🔹 1. Education Research

 Comparing average test scores between two or more student groups (using t-test or
ANOVA)
 Evaluating effectiveness of teaching methods

🔹 2. Medical and Clinical Trials

 Analyzing effects of a drug on patients (pre-test vs post-test scores using paired t-test)
 Comparing blood pressure or glucose levels between control and treatment groups

🔹 3. Business and Economics

 Comparing mean sales across regions or periods


 Studying impact of a new marketing strategy using regression analysis

🔹 4. Psychology

 Measuring differences in IQ scores between genders or age groups


 Studying memory test scores before and after therapy

🔹 5. Engineering & Agriculture


 Testing performance of machines or fertilizers using ANOVA
 Evaluating crop yield under different treatments

📊 Applications of Non-Parametric Tests

Non-parametric tests are used when data is ordinal, nominal, or does not meet parametric
assumptions. They are especially useful for small sample sizes or ranked data.

🔹 1. Social Sciences

 Testing association between gender and voting preference (Chi-square test)


 Studying public opinion or satisfaction levels (ordinal data)

🔹 2. Psychology and Behavior Studies

 Comparing anxiety levels (ranked scores) between treated and untreated groups
(Mann–Whitney U)
 Analyzing before-after scores on non-normally distributed psychological scales
(Wilcoxon test)

🔹 3. Health and Medical Fields

 Evaluating pain relief scores using ranked data


 Analyzing survival data or treatment outcomes with unknown distributions

🔹 4. Market Research

 Analyzing customer satisfaction rankings (Spearman's rank correlation)


 Studying preference patterns across different product designs

🔹 5. Education

 Testing changes in student rankings after an intervention


 Comparing ranked performance across groups with non-normal distributions

📌 Summary Table:
Test Type Data Type Common Fields of Application

Parametric Tests Interval/Ratio (Normal) Medicine, Education, Business, Agriculture

Non-Parametric Ordinal/Nominal (Non- Psychology, Sociology, Health Sciences,


Tests normal) Market Research
✅ Final Note:

 Parametric tests: High power, best when assumptions are met


 Non-parametric tests: More flexible, ideal for real-world data that is messy or
limited

📊 DESCRIPTIVE STATISTICAL ANALYSIS

✅ Definition:

Descriptive statistical analysis refers to the process of summarizing, organizing, and


presenting data in a meaningful way. It helps to describe the main features of a dataset and
provides simple summaries about the sample and the measures.

🔹 In simple terms:
It tells you what the data shows, without drawing conclusions beyond the data itself.

🎯 Objectives of Descriptive Statistics:

1. To summarize large volumes of data


2. To describe central values (e.g., average, typical)
3. To show variability or spread in data
4. To make data easier to interpret through charts and tables

📘 Types of Descriptive Statistics

🔹 1. Measures of Central Tendency

They describe the center or average of a dataset.

Measure Description Formula Example

Mean Arithmetic average xˉ=∑xn\bar{x} = \frac{\sum x}{n}xˉ=n∑x

Median Middle value in ordered data Middle number (odd n)

Mode Most frequently occurring value Most common number in dataset

🔹 2. Measures of Dispersion (Variability)

They describe the spread or spread-out-ness of data.


Measure Description

Range Difference between highest and lowest values

Variance Average of squared deviations from the mean

Standard Deviation (SD) Square root of variance (shows spread around the mean)

Interquartile Range (IQR) Range between Q1 (25%) and Q3 (75%) values

🔹 3. Measures of Position

These locate a specific data point in relation to the entire dataset.

Measure Description

Percentiles Values below which a given % of observations fall

Quartiles Divide data into four equal parts

Deciles Divide data into ten equal parts

🔹 4. Graphical Representation

Used to visualize data for easier understanding.

Tool Used For

Bar chart Categorical data

Histogram Continuous/numerical data

Pie chart Percentage/proportional comparison

Line graph Trend over time

Box plot Shows median, IQR, and outliers

Example:

Data Set: 10, 12, 15, 18, 20

 Mean: 10+12+15+18+205=15\frac{10+12+15+18+20}{5} = 15510+12+15+18+20


=15
 Median: 15 (middle value)
 Mode: None (no repeats)
 Range: 20 − 10 = 10
 Standard Deviation: √[(25 + 9 + 0 + 9 + 25)/5] = √13.6 ≈ 3.69

📌 Applications of Descriptive Statistical Analysis:

 Education: Summarizing test scores of students


 Health: Describing patient recovery times or symptom frequency
 Business: Analyzing monthly sales data
 Social Research: Summarizing survey results
 Finance: Reporting average investment returns

✅ Conclusion:

Descriptive statistics help to make raw data understandable by reducing it to a manageable


form. They form the foundation for further analysis such as inferential statistics.

📊 GRAPHICAL AND DIAGRAMMATIC REPRESENTATION OF DATA

✅ Definition:

Graphical and diagrammatic representation refers to the use of visual tools like charts,
graphs, and diagrams to present statistical data in a way that is easily understood,
interpreted, and compared.

📌 In simple words: It is the visual presentation of data to make trends, patterns, and
comparisons clear at a glance.

🎯 Objectives:

 To present complex data in a simplified, visual form


 To facilitate quick understanding and comparison
 To highlight trends, relationships, and variations in data
 To support decision-making and reporting

🔷 Types of Graphical and Diagrammatic Representations:


🔹 1. Line Graph

 Used to show trends over time (continuous data).


 Points are plotted and connected by a line.

✅ Example: Temperature changes over a week.

🔹 2. Bar Graph / Bar Diagram

 Shows comparisons between categories using rectangular bars.


 Bars can be vertical or horizontal.

✅ Example: Number of students in different classes.

🔹 3. Pie Chart (Circular Diagram)

 A circle divided into sectors representing proportions or percentages.


 Best for showing part-to-whole relationships.

✅ Example: Market share of different brands.

🔹 4. Histogram

 A type of bar chart for continuous data grouped into intervals (classes).
 Bars touch each other to indicate continuity.

✅ Example: Distribution of student marks.

🔹 5. Frequency Polygon

 Line graph made by joining the midpoints of the top of histogram bars.
 Useful for comparing frequency distributions.

🔹 6. Ogive (Cumulative Frequency Curve)

 Represents cumulative frequency distribution.


 Helps to determine median, quartiles, percentiles.
🔹 7. Pictogram

 Uses pictures or icons to represent data.


 Often used in elementary education or public displays.

🔹 8. Dot Plot / Scatter Plot

 Dot plot: Shows frequency by stacking dots above a number line.


 Scatter plot: Shows relationship between two variables (correlation).

✅ Example: Relationship between study time and test scores.

📌 Comparison: Graphs vs Diagrams


Feature Graphs Diagrams

Data type Quantitative (numerical) Quantitative or Qualitative

Shows trends? Yes Not always

Examples Line graph, histogram Pie chart, pictogram, bar diagram

Best used for Time series, comparisons Proportions, categories

📝 Best Practices:

 Choose the right type of graph/diagram for the data.


 Use proper labels, scales, and titles.
 Avoid clutter—keep it clean and readable.
 Use colors and legends wisely for clarity.

✅ Conclusion:

Graphical and diagrammatic representations are powerful tools in statistics and research.
They enhance communication, support data interpretation, and help in making informed
decisions.

📊 MEASURES OF CENTRAL TENDENCY

✅ Definition:
Measures of central tendency are statistical tools used to identify the central or average
value of a dataset. They give a single value that represents the entire distribution, helping to
understand where the data is centered.

🔹 In simple words: It shows the typical or average value in a dataset.

🔺 Types of Measures of Central Tendency:

1. Mean (Arithmetic Average)

📘 Definition:

The sum of all values divided by the number of values.

🧮 Formula:

Mean=∑xn\text{Mean} = \frac{\sum x}{n}Mean=n∑x

✅ Example:

Marks: 10, 15, 20

Mean=10+15+203=453=15\text{Mean} = \frac{10 + 15 + 20}{3} = \frac{45}{3} =


15Mean=310+15+20=345=15

2. Median

📘 Definition:

The middle value in a sorted (ordered) dataset.

 If n is odd: Median = Middle value


 If n is even: Median = Average of two middle values

✅ Example:

Data: 5, 8, 11, 13, 16


Median = 11 (middle value)

Data: 6, 8, 10, 12
Median = (8 + 10)/2 = 9
3. Mode

📘 Definition:

The value that occurs most frequently in a dataset.

✅ Example:

Data: 4, 7, 7, 9, 10
Mode = 7 (it appears twice)

 A dataset can have no mode, one mode (unimodal), two modes (bimodal), or more
(multimodal).

🧾 Comparison Table:
Measure Usefulness Best For

Mean Considers all values; affected by outliers Normally distributed data

Median Not affected by extreme values Skewed data or open-ended distributions

Mode Identifies most common value Categorical or qualitative data

📌 Applications:
Field Use of Central Tendency Measures

Education Average marks of students

Health Median recovery time

Business Average sales or income

Sociology Most common family size

Agriculture Average yield of crops per acre

🧠 Tips:

 Mean is used when the data is symmetrical.


 Median is better when there are outliers or skewed data.
 Mode is ideal for non-numerical data (e.g., favorite fruit, most used transport).

Here are practice problems on Measures of Central Tendency (Mean, Median, and Mode),
including both ungrouped and grouped data problems.
🔹 A. Ungrouped Data Problems

🧮 1. Mean – Basic Level

Question:
Find the mean of the following numbers:
8, 12, 15, 10, 5

Solution:

Mean=8+12+15+10+55=505=10\text{Mean} = \frac{8 + 12 + 15 + 10 + 5}{5} = \frac{50}


{5} = 10Mean=58+12+15+10+5=550=10

🧮 2. Median – Odd Number of Values

Question:
Find the median of:
3, 9, 11, 15, 17

Solution:
Ordered Data = 3, 9, 11, 15, 17
Median = 11 (middle value)

🧮 3. Median – Even Number of Values

Question:
Find the median of:
6, 10, 14, 18

Solution:
Ordered Data = 6, 10, 14, 18
Median = (10 + 14) / 2 = 12

🧮 4. Mode

Question:
Find the mode of:
2, 5, 7, 5, 9, 5, 8
Solution:
Mode = 5 (occurs 3 times)

🔷 B. Grouped Data Problems

🧮 5. Mean – Grouped Data

Class Interval Frequency

0 – 10 4

10 – 20 6

20 – 30 10

30 – 40 5

Steps:

1. Find midpoints (x) of each class:


5, 15, 25, 35
2. Multiply midpoints by frequency (f × x)
3. Apply the formula:

Mean=∑f⋅x∑f\text{Mean} = \frac{\sum f \cdot x}{\sum f}Mean=∑f∑f⋅x


∑f⋅x=(4×5)+(6×15)+(10×25)+(5×35)=20+90+250+175=535∑f=4+6+10+5=25Mean=53525=
21.4\sum f \cdot x = (4×5) + (6×15) + (10×25) + (5×35) = 20 + 90 + 250 + 175 = 535 \sum f
= 4 + 6 + 10 + 5 = 25 \text{Mean} = \frac{535}{25} =
21.4∑f⋅x=(4×5)+(6×15)+(10×25)+(5×35)=20+90+250+175=535∑f=4+6+10+5=25Mean=25
535=21.4

🧮 6. Median – Grouped Data

Class Interval Frequency

0 – 10 5

10 – 20 8

20 – 30 12

30 – 40 10
Steps:

 N = 35, N/2 = 17.5


 Cumulative Frequencies:
o 0–10: 5
o 10–20: 13
o 20–30: 25
 Median class = 20–30 (because 17.5 lies here)

Use Formula:

Median=L+(N2−Ff)×h\text{Median} = L + \left(\frac{\frac{N}{2} - F}{f}\right) \times


hMedian=L+(f2N−F)×h

Where:
L = lower boundary = 20
N = 35
F = cumulative frequency before median class = 13
f = frequency of median class = 12
h = class width = 10

Median=20+(17.5−1312)×10=20+(4.512)×10=20+3.75=23.75\text{Median} = 20 + \left(\
frac{17.5 - 13}{12}\right) \times 10 = 20 + \left(\frac{4.5}{12}\right) \times 10 = 20 + 3.75
= 23.75Median=20+(1217.5−13)×10=20+(124.5)×10=20+3.75=23.75

🧮 7. Mode – Grouped Data

Class Interval Frequency

10 – 20 5

20 – 30 10

30 – 40 12

40 – 50 8

Steps:

 Modal class = 30–40 (highest frequency = 12)

Use Formula:

Mode=L+(f1−f02f1−f0−f2)×h\text{Mode} = L + \left( \frac{f_1 - f_0}{2f_1 - f_0 - f_2} \


right) \times hMode=L+(2f1−f0−f2f1−f0)×h
Where:
L = 30,
f₁ = 12 (modal class freq)
f₀ = 10 (preceding class)
f₂ = 8 (succeeding class)
h = 10

Mode=30+(12−102×12−10−8)×10=30+(224−18)×10=30+(26)×10=30+3.33=33.33\
text{Mode} = 30 + \left( \frac{12 - 10}{2×12 - 10 - 8} \right) × 10 = 30 + \left( \frac{2}{24 -
18} \right) × 10 = 30 + \left( \frac{2}{6} \right) × 10 = 30 + 3.33 =
33.33Mode=30+(2×12−10−812−10)×10=30+(24−182)×10=30+(62)×10=30+3.33=33.33

📘 Practice Exercise (Solve Yourself):

🔹 A. Ungrouped

1. Find the mean, median, and mode of:


14, 18, 20, 15, 18, 19, 22
2. Find the median of:
34, 25, 18, 45, 32, 27 (Hint: Arrange in order first)

🔹 B. Grouped

Class Interval Frequency

0–10 3

10–20 5

20–30 8

30–40 4

3. Find the mean of the data.


4. Find the median class and calculate the median.

Let me know if you'd like the answers, or want this as a printable worksheet!

Ask ChatGPT

📊 MEASURES OF DISPERSION
✅ Definition:

Measures of dispersion are statistical tools used to describe the spread, variability, or
consistency of a dataset. They indicate how much the values in a dataset differ from the
average (central tendency).

🔍 In simple terms: They tell us how scattered or close together the data points are.

🎯 Objectives of Measures of Dispersion:

 To understand the degree of variation in data


 To compare uniformity or consistency across different datasets
 To complement measures of central tendency
 To assess risk or uncertainty in business, finance, science, etc.

🔢 Types of Measures of Dispersion

🔹 1. Range

Definition:
Difference between the highest and lowest value.

Range=Maximum value−Minimum value\text{Range} = \text{Maximum value} - \


text{Minimum value}Range=Maximum value−Minimum value

✅ Example:
Data: 10, 15, 20, 25
Range = 25 − 10 = 15

🔹 2. Interquartile Range (IQR)

Definition:
Spread of the middle 50% of the data.

IQR=Q3−Q1\text{IQR} = Q_3 - Q_1IQR=Q3−Q1

Where:

 Q1Q_1Q1: First Quartile (25%)


 Q3Q_3Q3: Third Quartile (75%)
✅ Use: Reduces the effect of extreme values/outliers

🔹 3. Mean Deviation (MD)

Definition:
Average of the absolute deviations from the mean or median.

MD=∑∣x−xˉ∣n\text{MD} = \frac{\sum |x - \bar{x}|}{n}MD=n∑∣x−xˉ∣

✅ Shows: How much, on average, each data point differs from the mean.

🔹 4. Variance (σ²)

Definition:
Average of the squared differences from the mean.

σ2=∑(x−xˉ)2n\sigma^2 = \frac{\sum (x - \bar{x})^2}{n}σ2=n∑(x−xˉ)2

✅ Used for: Advanced statistical analysis and risk assessment.

🔹 5. Standard Deviation (SD or σ)

Definition:
Square root of variance. Shows how spread out the values are from the mean.

σ=∑(x−xˉ)2n\sigma = \sqrt{\frac{\sum (x - \bar{x})^2}{n}}σ=n∑(x−xˉ)2

✅ Most widely used measure of dispersion.

📘 Comparison Table:
Sensitive to Simple to
Measure Best For
Outliers Calculate

Range Quick estimate of spread Yes ✅

IQR Spread of middle 50% No ✅

Mean Deviation Consistent deviations Less Moderate

Variance Statistical formulas and Yes ❌


Sensitive to Simple to
Measure Best For
Outliers Calculate

models

Standard
Risk, reliability, consistency Yes ❌
Deviation

🧾 Applications:
Field Application Example

Education Variation in students' scores

Finance Risk in investment returns (standard deviation)

Manufacturing Product quality variation

Agriculture Yield fluctuation over seasons

Healthcare Patient response variability to treatments

🧠 Example (Standard Deviation):

Data: 5, 7, 8, 10, 10

 Mean = 5+7+8+10+105=8\frac{5 + 7 + 8 + 10 + 10}{5} = 855+7+8+10+10=8


 Deviations: -3, -1, 0, 2, 2
 Squared deviations: 9, 1, 0, 4, 4
 Variance: 9+1+0+4+45=3.6\frac{9+1+0+4+4}{5} = 3.659+1+0+4+4=3.6
 Standard Deviation: 3.6≈1.9\sqrt{3.6} \approx 1.93.6≈1.9

Here are some simple, step-by-step sums on Standard Deviation using small datasets.
These are perfect for learning and practicing the concept.

✅ Formula for Standard Deviation (Ungrouped Data):


σ=∑(x−xˉ)2n\sigma = \sqrt{ \frac{\sum (x - \bar{x})^2}{n} }σ=n∑(x−xˉ)2

Where:

 xxx = each value


 xˉ\bar{x}xˉ = mean of the data
 nnn = number of observations
🔹 Sum 1:

Data: 5, 7, 9

Step 1: Find the Mean

xˉ=5+7+93=213=7\bar{x} = \frac{5 + 7 + 9}{3} = \frac{21}{3} = 7xˉ=35+7+9=321=7

Step 2: Find deviations and square them

 (5 − 7)² = 4
 (7 − 7)² = 0
 (9 − 7)² = 4

Step 3: Find Variance and Standard Deviation

σ2=4+0+43=83≈2.67\sigma^2 = \frac{4 + 0 + 4}{3} = \frac{8}{3} \approx 2.67 σ2=34+0+4


=38≈2.67 σ=2.67≈1.63\sigma = \sqrt{2.67} \approx 1.63σ=2.67≈1.63

🔹 Sum 2:

Data: 10, 12, 14, 16

Step 1: Mean

xˉ=10+12+14+164=524=13\bar{x} = \frac{10 + 12 + 14 + 16}{4} = \frac{52}{4} =


13xˉ=410+12+14+16=452=13

Step 2: Squared deviations

 (10 − 13)² = 9
 (12 − 13)² = 1
 (14 − 13)² = 1
 (16 − 13)² = 9

Step 3: Variance and SD

σ2=9+1+1+94=204=5\sigma^2 = \frac{9 + 1 + 1 + 9}{4} = \frac{20}{4} = 5 σ2=49+1+1+9


=420=5 σ=5≈2.24\sigma = \sqrt{5} \approx 2.24σ=5≈2.24

🔹 Sum 3:

Data: 4, 4, 4, 4

Step 1: Mean
xˉ=4+4+4+44=4\bar{x} = \frac{4+4+4+4}{4} = 4xˉ=44+4+4+4=4

Step 2: All deviations are 0 → variance = 0

σ=0\sigma = 0σ=0

✅ Standard Deviation is zero when all values are the same.

🔹 Sum 4:

Data: 2, 4, 6, 8, 10

Step 1: Mean

xˉ=2+4+6+8+105=305=6\bar{x} = \frac{2+4+6+8+10}{5} = \frac{30}{5} =


6xˉ=52+4+6+8+10=530=6

Step 2: Squared deviations

 (2−6)² = 16
 (4−6)² = 4
 (6−6)² = 0
 (8−6)² = 4
 (10−6)² = 16

σ2=16+4+0+4+165=405=8\sigma^2 = \frac{16 + 4 + 0 + 4 + 16}{5} = \frac{40}{5} = 8


σ2=516+4+0+4+16=540=8 σ=8≈2.83\sigma = \sqrt{8} \approx 2.83σ=8≈2.83

📝 Worksheet: Simple Sums on Standard Deviation

🔢 Q1.

Find the standard deviation for the following data:


5, 7, 9

🔢 Q2.

Calculate the standard deviation for the numbers:


10, 12, 14, 16

🔢 Q3.
Find the standard deviation of:
4, 4, 4, 4

🔢 Q4.

Compute the standard deviation for:


2, 4, 6, 8, 10

🔢 Q5.

A student got the following marks in tests:


55, 60, 65, 70, 75
Find the standard deviation.

🔢 Q6.

Determine the standard deviation of the dataset:


3, 6, 9

🔢 Q7.

Find the variance and standard deviation of:


7, 9, 11, 13, 15

🔢 Q8.

A worker produced the following number of items in 5 days:


18, 20, 22, 24, 26
Calculate the standard deviation.

🔢 Q9.

Find the standard deviation for the following small dataset:


1, 2, 3, 4, 5
🔢 Q10.

The ages of a group of children are:


5, 6, 7, 8
Find the standard deviation.

Here is the ✅ Answer Key for the Simple Standard Deviation Worksheet with step-by-
step solutions using the formula:

σ=∑(x−xˉ)2n\sigma = \sqrt{\frac{\sum (x - \bar{x})^2}{n}}σ=n∑(x−xˉ)2

🔢 Q1. Data: 5, 7, 9

 Mean xˉ=7\bar{x} = 7xˉ=7


 Deviations: -2, 0, 2
 Squared Deviations: 4, 0, 4
 Variance = 83≈2.67\frac{8}{3} \approx 2.6738≈2.67
 SD ≈ 1.63

🔢 Q2. Data: 10, 12, 14, 16

 Mean xˉ=13\bar{x} = 13xˉ=13


 Deviations: -3, -1, 1, 3
 Squared: 9, 1, 1, 9
 Variance = 204=5\frac{20}{4} = 5420=5
 SD = √5 ≈ 2.24

🔢 Q3. Data: 4, 4, 4, 4

 Mean = 4
 All deviations = 0
 SD = 0

🔢 Q4. Data: 2, 4, 6, 8, 10

 Mean = 6
 Squared deviations: 16, 4, 0, 4, 16
 Variance = 405=8\frac{40}{5} = 8540=8
 SD = √8 ≈ 2.83

🔢 Q5. Data: 55, 60, 65, 70, 75


 Mean = 65
 Squared deviations: 100, 25, 0, 25, 100
 Variance = 2505=50\frac{250}{5} = 505250=50
 SD = √50 ≈ 7.07

🔢 Q6. Data: 3, 6, 9

 Mean = 6
 Squared deviations: 9, 0, 9
 Variance = 183=6\frac{18}{3} = 6318=6
 SD = √6 ≈ 2.45

🔢 Q7. Data: 7, 9, 11, 13, 15

 Mean = 11
 Squared deviations: 16, 4, 0, 4, 16
 Variance = 405=8\frac{40}{5} = 8540=8
 SD = √8 ≈ 2.83

🔢 Q8. Data: 18, 20, 22, 24, 26

 Mean = 22
 Squared deviations: 16, 4, 0, 4, 16
 Variance = 40 ÷ 5 = 8
 SD = √8 ≈ 2.83

🔢 Q9. Data: 1, 2, 3, 4, 5

 Mean = 3
 Squared deviations: 4, 1, 0, 1, 4
 Variance = 10 ÷ 5 = 2
 SD = √2 ≈ 1.41

🔢 Q10. Data: 5, 6, 7, 8

 Mean = 6.5
 Squared deviations: 2.25, 0.25, 0.25, 2.25
 Variance = 5 ÷ 4 = 1.25
 SD = √1.25 ≈ 1.12
📘 CORRELATION – DEFINITION AND TYPES

✅ Definition of Correlation:

Correlation is a statistical tool that measures the degree and direction of a relationship
between two or more variables.

In simple terms: Correlation tells us how strongly and in what direction two variables are
related.

🔁 Example:

 If height increases, and weight also increases, there is positive correlation.


 If price increases and demand decreases, there is negative correlation.

📊 Types of Correlation

Correlation can be classified on the basis of:

I. Direction of Relationship

🔹 1. Positive Correlation

 Both variables increase or decrease together.


 Example: Income ↑ → Expenditure ↑

🔹 2. Negative Correlation

 One variable increases, the other decreases.


 Example: Price ↑ → Demand ↓

🔹 3. Zero (No) Correlation

 No consistent pattern between variables.


 Example: Shoe size and intelligence.

II. Number of Variables

🔹 1. Simple Correlation

 Relationship between two variables only.


 Example: Temperature and electricity usage.
🔹 2. Multiple Correlation

 Relationship between one variable and two or more others.


 Example: Crop yield and rainfall, fertilizer use, and temperature.

🔹 3. Partial Correlation

 Relationship between two variables while controlling other variables.


 Example: Income and expenditure, controlling for family size.

III. Form of Relationship

🔹 1. Linear Correlation

 Change in one variable is proportional to the change in the other.


 Represented by a straight line in graph.
 Example: Salary and years of experience.

🔹 2. Non-linear (Curvilinear) Correlation

 Change in one variable is not constant with respect to the other.


 Example: Stress and productivity.

IV. Method of Measurement

🔹 1. Pearson’s Correlation Coefficient (r)

 Measures strength and direction of linear relationship.


 Value ranges from –1 to +1.

🔹 2. Spearman’s Rank Correlation

 Based on ranking of data.


 Used when data is ordinal or not normally distributed.

📈 Summary Table
Type Description Example

Positive Correlation ↑X → ↑Y or ↓X → ↓Y Height and Weight

Negative
↑X → ↓Y Price and Demand
Correlation
Type Description Example

Zero Correlation No relation Shoe size and Intelligence

Simple Correlation Between two variables Study time and exam marks

Multiple Between one dependent and many


Crop yield and inputs
Correlation independent vars

Income vs. Spending


Partial Correlation Between two vars, controlling others
(controlling age)

Linear Correlation Constant rate of change Distance and time

Non-linear
Changing rate of relation Age and strength
Correlation

You might also like