Descriptive Analytics - Unit 3: Detailed Notes
Measures of Location (Central Tendency)
Definition:
- Statistical values that represent the center or typical value of a dataset.
- Includes mean, median, mode.
Types of Mean:
- Arithmetic Mean:
Formula: Mean = (Sumx) / n
Sum of observations divided by the number of observations.
Sensitive to outliers.
Excel: =AVERAGE(number1, [number2], ...)
- Geometric Mean:
Used for growth rates or ratios.
Formula: (x1 * x2 * ... * xn)^(1/n)
Excel: =GEOMEAN(number1, [number2], ...)
- Harmonic Mean:
Suitable for rates or ratios (e.g., speed).
Formula: n / (Sum (1/xi))
Excel: =HARMEAN(number1, [number2], ...)
Median:
- Middle value when data is ordered.
- Robust to outliers.
- For odd number of observations: Middle value.
- For even number: Average of two middle values.
- Excel: =MEDIAN(number1, [number2], ...)
Mode:
- Most frequent value in the dataset.
- Useful for categorical data.
- Excel: =MODE.SNGL(number1, [number2], ...)
Midrange:
- Average of the smallest and largest values.
- Formula: (Min + Max) / 2
Practical Application in Excel:
Example: Data = 100, 120, 130, 120, 140
- Mean: =AVERAGE(A1:A5) 122
- Median: =MEDIAN(A1:A5) 120
- Mode: =MODE.SNGL(A1:A5) 120
Measures of Variability:
Definition:
- Quantify the spread or dispersion of data points.
- Includes range, variance, standard deviation, interquartile range.
Range:
- Formula: Max - Min
- Excel: =MAX(A1:A5) - MIN(A1:A5)
Variance:
- Population: sigma^2 = Sum(xi - mu)^2 / N
- Sample: s^2 = Sum(xi - x)^2 / (n - 1)
- Excel: =VAR.P(A1:A5) or =VAR.S(A1:A5)
Standard Deviation:
- Population: sigma = sqrtsigma^2
- Sample: s = sqrts^2
- Excel: =STDEV.P(A1:A5) or =STDEV.S(A1:A5)
Interquartile Range (IQR):
- Formula: Q3 - Q1
- Excel: =QUARTILE.INC(A1:A5, 3) - QUARTILE.INC(A1:A5, 1)
Example:
Data = 10, 20, 30, 40, 50
- Range: 40
- Variance: 250
- Std Dev: ~15.81
- IQR: 30
Measures of Shape:
Skewness:
- Positive skew: Mean > Median
- Negative skew: Mean < Median
- Excel: =SKEW(A1:A5)
Kurtosis:
- High kurtosis: Sharp peak, heavy tails.
- Low kurtosis: Flat distribution.
- Excel: =KURT(A1:A5)
Percentiles and Quartiles:
- Percentiles: e.g., 90th percentile = 90% of data is below.
Excel: =PERCENTILE.INC(A1:A5, 0.9)
- Quartiles:
Q1 = 25th, Q2 = 50th, Q3 = 75th percentile.
Excel: =QUARTILE.INC(A1:A5, n)
Correlation Analysis:
Definition:
- Measures strength and direction of relationship between variables.
- Pearson Correlation Coefficient (r):
Formula: r = Sum[(xi - x)(yi - y_bar)] / sqrt[Sum(xi - x)^2 * Sum(yi - y_bar)^2]
- Excel: =CORREL(A1:A5, B1:B5)
Interpretation:
- r > 0: Positive relationship
- r < 0: Negative relationship
- |r| 1: Strong relationship; 0: Weak
Exercises:
Central Tendency:
Data: 50, 60, 55, 70, 65
- Mean: =AVERAGE(A1:A5) 60
- Median: =MEDIAN(A1:A5) 60
- Mode: =MODE.SNGL(A1:A5) None
Variability:
Data: 85, 90, 88, 92, 87
- Range: 7
- Std Dev: ~2.74
- IQR: 4
Correlation:
Hours Studied: 2, 4, 6, 8, 10
Scores: 60, 70, 80, 90, 95
- =CORREL(A1:A5, B1:B5) ~0.99
Quiz Questions:
1. Median of 10, 20, 30, 40, 50? 30
2. r = -0.8 indicates? Strong negative relationship
3. Sample Std Dev in Excel? =STDEV.S