0% found this document useful (0 votes)
20 views5 pages

Descriptive Analytics Unit 3 Notes

The document provides detailed notes on descriptive analytics, focusing on measures of central tendency, variability, and shape, including definitions and formulas for mean, median, mode, range, variance, and standard deviation. It also covers correlation analysis, explaining the Pearson Correlation Coefficient and its interpretation. Practical applications in Excel are included for each statistical measure, along with example exercises and quiz questions for reinforcement.

Uploaded by

Abhishek Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views5 pages

Descriptive Analytics Unit 3 Notes

The document provides detailed notes on descriptive analytics, focusing on measures of central tendency, variability, and shape, including definitions and formulas for mean, median, mode, range, variance, and standard deviation. It also covers correlation analysis, explaining the Pearson Correlation Coefficient and its interpretation. Practical applications in Excel are included for each statistical measure, along with example exercises and quiz questions for reinforcement.

Uploaded by

Abhishek Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Descriptive Analytics - Unit 3: Detailed Notes

Measures of Location (Central Tendency)


Definition:
- Statistical values that represent the center or typical value of a dataset.
- Includes mean, median, mode.

Types of Mean:

- Arithmetic Mean:
Formula: Mean = (Sumx) / n
Sum of observations divided by the number of observations.
Sensitive to outliers.
Excel: =AVERAGE(number1, [number2], ...)

- Geometric Mean:
Used for growth rates or ratios.
Formula: (x1 * x2 * ... * xn)^(1/n)
Excel: =GEOMEAN(number1, [number2], ...)

- Harmonic Mean:
Suitable for rates or ratios (e.g., speed).
Formula: n / (Sum (1/xi))
Excel: =HARMEAN(number1, [number2], ...)

Median:
- Middle value when data is ordered.
- Robust to outliers.
- For odd number of observations: Middle value.
- For even number: Average of two middle values.
- Excel: =MEDIAN(number1, [number2], ...)

Mode:
- Most frequent value in the dataset.
- Useful for categorical data.
- Excel: =MODE.SNGL(number1, [number2], ...)

Midrange:
- Average of the smallest and largest values.
- Formula: (Min + Max) / 2

Practical Application in Excel:


Example: Data = 100, 120, 130, 120, 140
- Mean: =AVERAGE(A1:A5) 122
- Median: =MEDIAN(A1:A5) 120
- Mode: =MODE.SNGL(A1:A5) 120

Measures of Variability:

Definition:
- Quantify the spread or dispersion of data points.
- Includes range, variance, standard deviation, interquartile range.

Range:
- Formula: Max - Min
- Excel: =MAX(A1:A5) - MIN(A1:A5)

Variance:
- Population: sigma^2 = Sum(xi - mu)^2 / N
- Sample: s^2 = Sum(xi - x)^2 / (n - 1)
- Excel: =VAR.P(A1:A5) or =VAR.S(A1:A5)

Standard Deviation:
- Population: sigma = sqrtsigma^2
- Sample: s = sqrts^2
- Excel: =STDEV.P(A1:A5) or =STDEV.S(A1:A5)
Interquartile Range (IQR):
- Formula: Q3 - Q1
- Excel: =QUARTILE.INC(A1:A5, 3) - QUARTILE.INC(A1:A5, 1)

Example:
Data = 10, 20, 30, 40, 50
- Range: 40
- Variance: 250
- Std Dev: ~15.81
- IQR: 30

Measures of Shape:

Skewness:
- Positive skew: Mean > Median
- Negative skew: Mean < Median
- Excel: =SKEW(A1:A5)

Kurtosis:
- High kurtosis: Sharp peak, heavy tails.
- Low kurtosis: Flat distribution.
- Excel: =KURT(A1:A5)

Percentiles and Quartiles:


- Percentiles: e.g., 90th percentile = 90% of data is below.
Excel: =PERCENTILE.INC(A1:A5, 0.9)
- Quartiles:
Q1 = 25th, Q2 = 50th, Q3 = 75th percentile.
Excel: =QUARTILE.INC(A1:A5, n)

Correlation Analysis:
Definition:
- Measures strength and direction of relationship between variables.
- Pearson Correlation Coefficient (r):
Formula: r = Sum[(xi - x)(yi - y_bar)] / sqrt[Sum(xi - x)^2 * Sum(yi - y_bar)^2]
- Excel: =CORREL(A1:A5, B1:B5)

Interpretation:
- r > 0: Positive relationship
- r < 0: Negative relationship
- |r| 1: Strong relationship; 0: Weak

Exercises:

Central Tendency:
Data: 50, 60, 55, 70, 65
- Mean: =AVERAGE(A1:A5) 60
- Median: =MEDIAN(A1:A5) 60
- Mode: =MODE.SNGL(A1:A5) None

Variability:
Data: 85, 90, 88, 92, 87
- Range: 7
- Std Dev: ~2.74
- IQR: 4

Correlation:
Hours Studied: 2, 4, 6, 8, 10
Scores: 60, 70, 80, 90, 95
- =CORREL(A1:A5, B1:B5) ~0.99

Quiz Questions:
1. Median of 10, 20, 30, 40, 50? 30
2. r = -0.8 indicates? Strong negative relationship
3. Sample Std Dev in Excel? =STDEV.S

You might also like