Statistical Analysis and Data Analysis - Detailed Notes
1. Hypothesis
Think of a hypothesis as a claim or a guess about the world, which we want to check with
data.
- In daily life: You think “My friend is always late.” That’s your hypothesis. You start
observing his arrival times to see if it’s true.
- In business: A manager says, “Average customer spends ₹500 per visit.” That’s a
hypothesis.
In statistics:
- Null hypothesis (H0): the status quo, no change, no difference.
- Alternative hypothesis (H1): the opposite claim, where some effect or difference exists.
Example: A restaurant chain claims their average delivery time = 30 minutes.
H0: μ = 30 minutes
H1: μ > 30 minutes
2. Hypothesis Testing
This is the formal process of checking if data supports your claim or not.
Like a courtroom:
- H0 is “innocent until proven guilty.”
- Data is the evidence.
- Judge = statistical test.
Steps:
1. Write down H0 and H1 clearly.
2. Choose significance level (α, usually 5%).
3. Collect sample data, compute test statistic.
4. Compare with critical value OR find p-value.
5. Decide: reject or fail to reject H0.
6. Interpret in business language.
Example: A restaurant’s average delivery is claimed to be 30 minutes.
Sample of 40 deliveries shows mean = 32. If the p-value < 0.05, reject H0 and conclude
delivery times are significantly longer.
3. Errors in Hypothesis Testing
Two types of errors can occur:
- Type I Error (α): Rejecting H0 when it is true. Like punishing an innocent person.
Business: Thinking a new ad works when it doesn’t.
- Type II Error (β): Not rejecting H0 when H1 is true. Like freeing a guilty person. Business:
Missing out on a campaign that actually works.
Power = 1 − β: Ability of test to detect real effect.
4. Ways of Testing
Two methods:
- Critical Value Method: Compare test statistic to critical threshold.
- p-value Method: If p ≤ α, reject H0.
Both methods give same decision.
5. Types of Tests
Choose test based on data type and situation:
- Z-test: Mean with known σ or large n.
- T-test: Mean with unknown σ, small n.
- ANOVA: Compare 3+ means.
- Chi-square: For categorical data.
- Regression: Relationship between variables.
6. Z Test
Use when population σ is known or sample size is large.
Formula: z = (x̄ − μ0) / (σ / √n)
Example: Company claims avg salary = 50,000. Sample mean = 52,000, n = 36, σ = 6,000.
SE = 6000 / √36 = 1000
z = (52000 − 50000) / 1000 = 2
Critical z = ±1.96. Since z > 1.96, reject H0. Salaries are significantly different.
7. T Test
Used when σ unknown, small n.
Formula: t = (x̄ − μ0) / (s / √n), df = n − 1
Types:
- One-sample t
- Two-sample t
- Paired t
Example: Productivity claimed mean = 70. Sample mean = 74, s = 8, n = 16.
SE = 8/√16 = 2. t = (74 − 70) / 2 = 2. df = 15. Critical t ≈ 2.13. Fail to reject H0.
8. ANOVA
Used when comparing 3 or more group means.
Idea: Break variation into between groups and within groups.
F = MSB / MSW
Example: Compare average sales from 3 ad campaigns. If p < 0.05, conclude at least one
campaign differs.
9. Chi-Square Test
For categorical data.
Formula: χ² = Σ (O − E)² / E
Types:
- Goodness of Fit: observed vs expected.
- Independence: relation between two categorical variables.
Example: 100 customers: Coke = 50, Pepsi = 30, Others = 20. If χ² significant, preferences
not equal.
10. Regression
Regression fits a line to predict y from x.
Equation: y = b0 + b1x
- b0 = intercept
- b1 = slope
Example: Ad spend vs Sales. b1 = 2.3 means each additional ₹1,000 ad spend increases sales
by 2.3 units.
Business applications: Forecasting, pricing, demand estimation.