Polynomial Linear Regression Notes
Grok 3
June 22, 2025
1 Introduction to Polynomial Linear Regression
Polynomial Linear Regression is an advanced version of Linear Regression used in machine
learning to model non-linear patterns in data. While Linear Regression fits a straight
line (y = mx + c), Polynomial Linear Regression fits a curved line using a polynomial
equation, such as y = a + b1 x + b2 x2 , to capture complex relationships.
1.1 Simple Explanation
• Linear Regression: Fits a straight line to data.
• Polynomial Linear Regression: Fits a curved line to capture non-linear patterns.
• Example: Useful when data points form a curve (e.g., parabola) instead of a straight
line.
2 How It Works
Polynomial Linear Regression models data using a polynomial equation:
y = a + b 1 x + b2 x 2 + b3 x 3 + · · · + bn x n (1)
where:
• y: Output (dependent variable)
• x: Input (independent variable)
• a: Constant (intercept)
• b1 , b2 , . . . , bn : Coefficients
• n: Degree of the polynomial
1
2.1 Steps
1. Collect data with input (x) and output (y) values.
2. Choose the polynomial degree (e.g., 2 for quadratic).
3. Train the model to find coefficients using a machine learning algorithm.
4. Predict y for new x values.
5. Evaluate the model using metrics like R2 or Mean Squared Error (MSE).
3 Comparison: Polynomial vs. Linear Regression
Feature Linear Regression Polynomial Linear Re-
gression
Equation y = mx + c (straight line) y = a + b1 x + b2 x 2 + . . .
(curve)
Data Pattern Best for linear data Best for curved/complex
patterns
Complexity Simple More complex
Overfitting Risk Low High (if degree is too high)
4 Real-Life Example
Suppose a company wants to predict ice cream sales (y) based on temperature (x). If
sales increase with temperature but follow a curved pattern (e.g., sales drop slightly at
very high temperatures), Polynomial Linear Regression is ideal.
Dataset Example:
Temperature (x) Sales (y)
20°C 200
25°C 300
30°C 350
35°C 320
5 Python Implementation
Below is a Python code example using scikit-learn to implement Polynomial Linear
Regression:
1 # Import libraries
2 from sklearn . preprocessing import Po lyn om ia lF ea tu re s
3 from sklearn . linear_model import LinearRegression
4 from sklearn . pipeline import make_pipeline
5
6 # Data ( example )
7 X = [[20] , [25] , [30] , [35]] # Temperature
8 y = [200 , 300 , 350 , 320] # Sales
9
2
10 # Polynomial Regression ( degree 2)
11 poly = Pol yn om ia lF ea tu re s ( degree =2)
12 polyreg = make_pipeline ( poly , LinearRegression () )
13 polyreg . fit (X , y )
14
15 # Predict for new value
16 new_temp = [[32]]
17 prediction = polyreg . predict ( new_temp )
18 print ( " Predicted sales for 32 C : " , prediction )
6 Important Points
• Polynomial Degree:
– Degree 1: Linear Regression.
– Degree 2: Quadratic (parabola).
– Degree 3+: Complex curves.
– High degrees may cause overfitting.
• Overfitting: High-degree polynomials may fit noise in data, reducing accuracy on
new data. Use cross-validation or regularization (e.g., Ridge Regression).
• Underfitting: Low-degree polynomials may fail to capture complex patterns.
• Evaluation Metrics:
– R2 Score: Measures how well the model explains data (closer to 1 is better).
– Mean Squared Error (MSE): Measures prediction error (lower is better).
7 Advantages
• Captures complex, non-linear patterns.
• Flexible (adjustable by changing degree).
• Easy to implement with libraries like scikit-learn.
8 Disadvantages
• High risk of overfitting with high degrees.
• Computationally expensive for high degrees.
• Complex polynomials are hard to interpret.
3
9 Applications
• Business: Sales prediction, customer behavior analysis.
• Science: Modeling temperature, pressure, or chemical reactions.
• Finance: Predicting stock prices or market trends.
• Engineering: Analyzing complex system behaviors.
10 Tips
• Visualize data (using matplotlib or seaborn) to decide if Polynomial Regression
is needed.
• Start with low degrees (2 or 3) to avoid overfitting.