Statistical Methods
1. What is a population and a sample?
A population is the entire group of individuals or observations we are interested in. A sample is a smaller
subset taken from the population, used to make inferences about the whole.
2. Define parameters and statistics.
A parameter is a numerical measure that describes a population, like the population mean μ. A statistic is
a measure calculated from a sample, like the sample mean 𝑥̄.
3. What is a random variable?
A random variable is a variable whose possible values are determined by the outcomes of a random
process. For example, the number of heads in coin tosses.
4. Difference between discrete and continuous variables?
Discrete variables take countable values (like number of children). Continuous variables take any value
within an interval (like height, weight, income).
5. What is probability distribution?
It’s a function that shows the likelihood of different outcomes of a random variable. Examples include the
binomial distribution and normal distribution.
6. What is the mean and variance of a distribution?
The mean is the expected value, or average, of the distribution. The variance measures how spread out the
values are around the mean.
7. What is the Central Limit Theorem (CLT)?
It says that as the sample size increases, the sampling distribution of the sample mean approaches a
normal distribution, even if the population is not normal.
8. What is hypothesis testing?
It’s a statistical procedure to test a claim about a population using sample data. We set up a null
hypothesis (H₀) and an alternative hypothesis (H₁).
9. What is a p-value?
It’s the probability of observing results as extreme as the sample, assuming the null hypothesis is true. A
small p-value indicates strong evidence against H₀.
10. When do we reject the null hypothesis?
If the p-value is less than the chosen significance level (say 0.05), we reject H₀ in favor of H₁.
11. What is Type I and Type II error?
Type I error is rejecting a true H₀ (false positive). Type II error is failing to reject a false H₀ (false
negative).
Safwan
12. What is confidence interval?
It’s a range of values within which the population parameter is likely to lie, with a given level of
confidence (e.g., 95%).
13. What is the difference between correlation and regression?
Correlation measures the strength and direction of the relationship between two variables. Regression
explains how one variable (dependent) changes with another (independent).
14. What is the least squares method?
It’s a method of estimating regression coefficients by minimizing the sum of squared errors between
observed and predicted values.
15. What is R-squared?
It measures the proportion of variation in the dependent variable explained by the independent variables
in a regression model.
16. What is multicollinearity?
It occurs when independent variables in a regression are highly correlated, making it difficult to isolate
their individual effects.
17. What is the law of large numbers (LLN)?
It states that as sample size increases, the sample mean tends to converge to the population mean.
18. Why do we standardize variables?
Standardization makes variables comparable, especially when measured on different scales, and allows
use of the standard normal distribution (z-scores).
19. What are the assumptions of classical linear regression model?
Linearity, independence of errors, homoscedasticity (constant variance), no perfect multicollinearity, and
normality of errors.
20. What is a sampling distribution?
It’s the probability distribution of a statistic (like sample mean) over repeated random samples from the
same population.
21. Why do we study sampling distributions?
Because they help us understand the behavior of estimators and make inferences about population
parameters from samples.
22. What is the Central Limit Theorem and why is it important?
It states that sample means become normally distributed as sample size grows. This is important because
it allows us to use normal distribution tools for inference.
Safwan
23. Why is the sample mean unbiased?
Because the expected value of the sample mean equals the population mean. On average, it doesn’t
overestimate or underestimate μ.
24. What are Type I and Type II errors?
Type I: rejecting a true H₀. Type II: failing to reject a false H₀. They represent the two possible wrong
decisions in hypothesis testing.
25. What does power of a test mean?
It’s the probability of correctly rejecting a false H₀. A higher power means the test is more effective at
detecting real effects.
26. Why do we use t-distribution instead of z in small samples?
Because when the population standard deviation is unknown and sample size is small, the t-distribution
accounts for the extra uncertainty.
27. Why is Chi-Square distribution not symmetric?
Because it is positively skewed, especially for small degrees of freedom. As the degrees of freedom
increase, it becomes more symmetric.
28. Why do F-tests require both numerator and denominator degrees of freedom?
Because the F-statistic is the ratio of two variances, and both numerator and denominator sample sizes
affect the distribution.
29. Why do we use ANOVA?
Analysis of Variance (ANOVA) is used to test whether the means of three or more groups are significantly
different.
30. Why is convergence in probability important?
Because it ensures that as sample size grows, an estimator gets closer to the true parameter value with
high probability.
31. What does consistency of an estimator mean?
It means the estimator converges in probability to the true population parameter as sample size increases.
32. What is asymptotic normality?
It means that as sample size becomes large, the sampling distribution of an estimator approaches a normal
distribution.
33. Why do we prefer Maximum Likelihood Estimation (MLE) in large samples?
Because MLEs are consistent, efficient, and asymptotically normal, making them very reliable for
inference.
Safwan
34. Why might we use least squares estimation?
Because it is simple, widely applicable, and gives unbiased and efficient estimates under the classical
regression assumptions.
35. Why is hypothesis testing useful?
Because it provides a formal method to make decisions and inferences from data, distinguishing between
random chance and real effects.
36. What is the significance level?
It’s the probability of making a Type I error, commonly set at 5%. It represents the risk we’re willing to
take of rejecting a true null hypothesis.
37. Why do confidence intervals give more information than point estimates?
Because they show both the estimate and the uncertainty around it, giving a range where the true
parameter is likely to lie.
38. What does convergence in distribution mean?
It means that the distribution of an estimator approaches a fixed probability distribution, often the normal,
as sample size increases.
39. Why use software like Stata or Excel in inference?
Because they allow us to handle large datasets, run simulations, estimate models, and test hypotheses
efficiently.
40. Why is LLN important in econometrics?
Because it justifies using sample averages to estimate population values, ensuring that our sample-based
results are meaningful in large samples.
Safwan