SIMPLE LINEAR REGRESSION EXPLANATION IN EASY ENGLISH
Definition:
Simple Linear Regression is a statistical method used to find the relationship between two variables:
- One independent variable (X) input/predictor
- One dependent variable (Y) output/target
It fits a straight line (called a regression line) through the data to predict Y from X.
Basic Idea:
If we know that X (like study hours) affects Y (like marks), then we try to draw a straight line that
best fits the data points to make predictions.
Equation of the Line:
Y = b0 + b1X
- Y: Predicted value (dependent variable)
- X: Given value (independent variable)
- b0: Intercept (where the line cuts the Y-axis)
- b1: Slope (change in Y for every 1 unit change in X)
Example:
Lets say we want to predict marks (Y) based on study hours (X).
If we calculate and get:
Y = 30 + 5X
This means:
- If a student studies 0 hours, they might still get 30 marks (intercept).
- For each extra hour of study, marks increase by 5 (slope).
Graphical View:
A scatter plot of data is made with X and Y values. Then a straight line is drawn to minimize the
difference between actual Y values and predicted Y values. This line is called the "best fit line".
How is the line decided? (Using Least Squares Method):
The model tries to minimize the Sum of Squared Errors (SSE):
SSE = (Yi - i)^2
Where:
- Yi: Actual value
- i: Predicted value from the line
Assumptions of Simple Linear Regression:
1. Linearity: Relationship between X and Y is linear.
2. Independence: Observations are independent of each other.
3. Homoscedasticity: Constant variance of errors.
4. Normality: Errors are normally distributed.
Advantages:
- Easy to understand and implement
- Requires less computation
- Useful for simple predictive analysis
Limitations:
- Works only when there is a linear relationship
- Sensitive to outliers
- Assumes a constant variance which may not be true
Applications:
- Predicting sales from advertising budget
- Estimating marks based on study hours
- Predicting salary based on experience
Python Code Example:
from sklearn.linear_model import LinearRegression
import numpy as np
# Input (X) and Output (Y)
X = np.array([[1], [2], [3], [4], [5]])
Y = np.array([2, 4, 5, 4, 5])
# Create and train the model
model = LinearRegression()
model.fit(X, Y)
# Predict
predicted = model.predict([[6]])
print("Predicted value for X=6 is:", predicted[0])
Conclusion (Short Exam Answer Format):
Simple Linear Regression is a method to model the relationship between one independent and one
dependent variable using a straight line. The model predicts the output by fitting the best line using
the least squares method. It is useful for understanding and forecasting outcomes based on a single
input feature.