Skip to main content
Regression Models
CHAPTER 11 Intermediate

Polynomial Regression

Updated: May 16, 2026
6 min read

# CHAPTER 11

Polynomial Regression

1. Introduction

So far, we have forced our models to draw perfectly straight lines through data. But the real world is rarely a straight line. If you plot the spread of a virus over time, it shoots up exponentially. If you plot a car's fuel efficiency against its speed, it forms a curve (terrible at 10mph, great at 55mph, terrible at 100mph). If you use a straight Linear Regression line on curved data, you will suffer from massive Underfitting. In this chapter, we introduce Polynomial Regression, giving our model the mathematical flexibility to draw curves.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Identify non-linear relationships in data.
  • Understand the math behind Polynomial terms ($x^2$, $x^3$).
  • Transform raw features using PolynomialFeatures.
  • Fit a Polynomial Regression model in scikit-learn.
  • Recognize the severe risks of Overfitting with high-degree polynomials.

3. The Math: Bending the Line

In Simple Linear Regression, the formula is a straight line: $y = (m1 \times x) + b$

To make the line curve, we simply add a "squared" version of our input feature to the equation! $y = (m1 \times x) + (m2 \times x^2) + b$

This is a Degree-2 Polynomial (a Parabola/U-shape). If the data wiggles twice (like an S-shape), we add a cubed term: $y = (m1 \times x) + (m2 \times x^2) + (m3 \times x^3) + b$

*The magic:* Even though the line is curving, the algorithm to find the weights ($m1, m2$) is the exact same Linear Regression algorithm! We are just feeding it engineered features.

4. Mini Project: Student Score Prediction Curve

Let's predict a student's Exam Score based on the Hours Studied. Notice the data curves: studying 1 hour vs 2 hours is a big jump, but studying 9 hours vs 10 hours has diminishing returns.
python
1234567891011121314151617181920212223242526272829303132
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

# 1. The Curved Data
X = np.array([[1], [2], [3], [4], [5], [6], [7]]) # Hours Studied
y = np.array([45, 50, 60, 80, 85, 88, 90])       # Exam Score

# 2. Transform the Data! (The Magic Step)
# We tell scikit-learn to create the x^2 column
poly_transformer = PolynomialFeatures(degree=2)
X_poly = poly_transformer.fit_transform(X)

# Look at the transformed data: It created [1, x, x^2]
# print(X_poly)

# 3. Fit a STANDARD Linear Regression model on the NEW polynomial data
model = LinearRegression()
model.fit(X_poly, y)

# 4. Visualize the Curve
# Create 100 smooth points between 1 and 7 for a smooth line
X_smooth = np.linspace(1, 7, 100).reshape(-1, 1)
X_smooth_poly = poly_transformer.transform(X_smooth)
y_smooth_pred = model.predict(X_smooth_poly)

plt.scatter(X, y, color='blue', label='Actual Scores')
plt.plot(X_smooth, y_smooth_pred, color='red', label='Polynomial Curve')
plt.title('Hours Studied vs Exam Score (Degree=2)')
plt.legend()
plt.show()

5. The Danger of Degrees (Overfitting)

The degree parameter is the most dangerous hyperparameter in regression.
  • Degree = 1: A straight line. (Underfitting a curve).
  • Degree = 2 or 3: A smooth, logical curve. (The Sweet Spot).
  • Degree = 15: The model creates a hyper-complex, chaotic squiggle that perfectly touches every single data point, but goes wildly off the chart in between points. This is extreme Overfitting (High Variance). It has memorized the training data and will fail miserably in the real world.

6. Pipeline Implementation

In production, you should combine the transformation and the regression into a single Pipeline to prevent messy code.
python
1234567
from sklearn.pipeline import make_pipeline

# Create a single object that does the polynomial transformation AND fits the model
poly_model = make_pipeline(PolynomialFeatures(degree=3), LinearRegression())

# Train it directly on the raw X data!
poly_model.fit(X, y)

7. Common Mistakes

  • Forgetting to transform Test Data: If you use PolynomialFeatures on your Training Data, your model is now trained to expect 3 columns ($1, x, x^2$). If you try to run model.predict() on raw Test Data that only has 1 column ($x$), the code will crash. You MUST transform the test data first, or use a Pipeline!
  • Extrapolation is catastrophic: Linear regression goes straight off the chart. Polynomial regression curves off the chart. If you predict data far outside your training range with a Degree-3 polynomial, the prediction will likely skyrocket to negative infinity or millions.

8. Best Practices

  • Keep Degrees Low: In 99% of real-world scenarios, a Degree of 2 or 3 is the maximum you should ever use. If your data requires a Degree of 10 to fit, you should be using a completely different algorithm like a Random Forest.

9. Exercises

  1. 1. If your raw input feature $X$ is [3], what will the output array look like after passing it through PolynomialFeatures(degree=3)? (Hint: don't forget the bias column of 1s).
  1. 2. Write a Python snippet using make_pipeline to create a Degree-4 polynomial regression model.

10. MCQ Quiz with Answers

Question 1

Why is Polynomial Regression still considered a "Linear" Regression model under the hood?

Question 2

What is the immediate risk of setting the polynomial degree parameter very high (e.g., degree=20)?

11. Interview Questions

  • Q: Explain how setting the degree hyperparameter relates to the Bias-Variance tradeoff in Polynomial Regression.
  • Q: Why is Extrapolation (predicting outside the training range) particularly dangerous when using Polynomial Regression compared to Simple Linear Regression?

12. FAQs

Q: Can I use Polynomial Regression with multiple features (e.g., Size and Bedrooms)? A: Yes! PolynomialFeatures will not only square Size and square Bedrooms, but it will create interaction terms (Size * Bedrooms), allowing the model to find complex relationships between different features!

13. Summary

The real world is rarely linear. By artificially engineering squared and cubed features, Polynomial Regression grants our linear algorithms the mathematical flexibility to draw smooth curves. However, with this flexibility comes the massive risk of Overfitting, reminding us that simpler models are usually superior.

14. Next Chapter Recommendation

What happens if we accidentally use too many features, or a polynomial degree that is slightly too high, and our model starts to overfit? How do we rein it in? In Chapter 12: Ridge Regression and Lasso Regression, we will introduce Regularization, a mathematical penalty that forces models to stay simple and robust.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·