Simple Linear Regression
Juvaria Tariq
Fundamentals of Econometrics — Fall 2025
Class Objectives
▶ Introduce the simple linear regression model
▶ Define regression-related terms and notation
▶ Understand residuals and the idea of the “best fit” line
▶ Derive OLS estimators
▶ Interpret estimated coefficients and visualize the fitted model
The Simple Linear Regression Model
▶ The population model:
y = β0 + β1 x + u
▶ y : dependent variable (outcome we want to explain)
▶ x: independent variable (explanatory factor)
▶ β0 , β1 : unknown parameters
▶ u: error term (other unobserved factors)
Terminology: Model vs Estimates
▶ Population model:
y = β0 + β1 x + u
▶ Estimated equation (sample):
Data: (xi , yi )
ŷ = β̂0 + β̂1 x
▶ Residual:
ûi = yi − ŷi
Population Sample (Estimate)
Parameter: β0 , β1 Coefficient: β̂0 , β̂1
Error: u Residual: û
What is Regression?
▶ Regression is a statistical method for modeling relationships
between variables.
▶ In simple linear regression, we ask:
▶ How much does y change when x changes?
▶ We fit a line that best represents the relationship between x
and y .
Basic Assumptions of the Linear Model
▶ Linearity in parameters
▶ E[u|x] = 0: zero conditional mean
▶ Random samples
Sample Dataset (Experience vs. Income)
Obs Experience (x) Income (y)
(years) ($)
1 1 19
2 2 22
3 3 21
4 4 26
5 5 24
Scatter Plot
Scatter Plot of Experience vs Income
28
27
26
25
Income ($)
24
23
22
21
20
19
18
0 1 2 3 4 5 6
Experience (years)
Finding Best Fit
Scatter Plot of Experience vs Income
28
27
26
25
Income ($)
24
23
22
21
20
19
18
0 1 2 3 4 5 6
Experience (years)
Multiple possible Lines
Many Possible Lines Through the Data
28
27 y = 18 + 1.2x
y = 17 + 1.6x
26
25
Income ($)
24
23
22
21
20
19
18
0 1 2 3 4 5 6
Experience (years)
What is the “Line of Best Fit”?
▶ Visualize data as points in a scatterplot
▶ A line attempts to capture the general trend in the data
▶ But which line is “best”?
▶ The line that minimizes the sum of squared vertical distances
(errors) between actual and predicted values
Residuals: The Vertical Gaps
▶ Residual: the prediction error for each observation
▶ ûi = yi − ŷi
▶ These are the vertical distances from the data points to the
fitted line
The Least Squares Idea
▶ We estimate ŷi = β̂0 + β̂1 xi
▶ Goal: minimize the sum of squared residuals:
n
X
(yi − (β̂0 + β̂1 xi ))2
i=1
▶ Why square?
▶ Avoid cancelling out positive/negative errors
▶ Penalize large errors more
Key Assumption: E[u|x] = 0
▶ Model: y = β0 + β1 x + u
▶ If E[u|x] = 0, then:
E[y |x] = β0 + β1 x
▶ This is the population regression function (PRF)
From Model to Data: Sample Version
▶ We observe (xi , yi ), and assume:
yi = β0 + β1 xi + ui
OLS Estimators
P
(xi − x̄)(yi − ȳ )
β̂1 =
(xi − x̄)2
P
β̂0 = ȳ − b1 x̄
▶ These are your slope and intercept estimates from data
▶ Easy to compute and interpret!
Worked Example (Using Our Dataset)
▶ Compute x̄, ȳ , then apply formulas
▶ Calculate fitted values ŷi and residuals ei = yi − ŷi
Visualizing the Fitted Line
▶ This is the line: ŷi = xi
Terminology Recap
▶ Model: population equation y = β0 + β1 x + u
▶ Estimates: fitted equation ŷi = b0 + b1 xi
▶ Residuals: ei = yi − ŷi
▶ Line of best fit: minimizes squared residuals
Interpretation of Coefficients
▶ Slope b1 : expected change in y for one-unit increase in x
▶ Intercept b0 : value of y when x = 0 (may not always be
meaningful)
Wrap-Up What’s Next
▶ Today’s focus: simple regression, derivation, interpretation
▶ Key concepts: model vs. estimates, residuals, least squares
▶ Next time: multiple regression and assumption testing