Skip to main content
Regression Models
CHAPTER 06 Intermediate

Simple Linear Regression

Updated: May 16, 2026
6 min read

# CHAPTER 6

Simple Linear Regression

1. Introduction

We have discussed the "Line of Best Fit" in theory. Now, it is time to build it. Simple Linear Regression is the most foundational machine learning algorithm in existence. It is called "Simple" because it uses exactly *one* independent variable (X) to predict the dependent variable (y). In this chapter, we will open up the black box of scikit-learn, understand the high school algebra that powers it, and build a salary prediction model.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Understand the mathematical equation: $y = mx + b$.
  • Explain the role of the Slope (Coefficient) and Intercept.
  • Train a LinearRegression model using scikit-learn.
  • Extract the mathematical formula from a trained model.
  • Visualize the regression line using matplotlib.

3. The Math: $y = mx + b$

Simple Linear Regression relies entirely on the equation of a straight line: $$y = (m \times X) + b$$
  • $y$: The prediction (e.g., Estimated Salary).
  • $X$: The input feature (e.g., Years of Experience).
  • $m$ (Slope/Coefficient): The weight assigned to $X$. It answers: *"For every 1 year increase in experience, how much does salary go up?"*
  • $b$ (Intercept): The baseline. If someone has 0 years of experience ($X=0$), what is their starting salary?

When you call model.fit(), the algorithm calculates the exact optimal values for $m$ and $b$ to minimize the error.

4. Mini Project: Salary Prediction Model

Let's build a model that predicts a software engineer's salary based on their years of experience.
python
123456789101112131415161718192021
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# 1. Provide the Data
# X must be a 2D array in scikit-learn (hence the double brackets)
X_train = np.array([[1], [2], [3], [4], [5], [6]]) # Years of Experience
y_train = np.array([45000, 50000, 60000, 80000, 110000, 150000]) # Salary in $

# 2. Initialize the Model
model = LinearRegression()

# 3. Train the Model (Find the optimal 'm' and 'b')
model.fit(X_train, y_train)

# 4. Make a Prediction!
years = 7
X_test = np.array([[years]])
prediction = model.predict(X_test)
print(f"Predicted salary for {years} years: ${prediction[0]:.2f}")
# Output: Predicted salary for 7 years: $157333.33

5. Extracting the Math Formula

Let's prove that this is just algebra. We can extract the Slope ($m$) and Intercept ($b$) directly from the trained model!
python
12345678
# Extract 'm' (Slope/Coefficient)
slope = model.coef_[0]

# Extract 'b' (Y-Intercept)
intercept = model.intercept_

print(f"Math Formula: Salary = ({slope:.2f} * Years) + {intercept:.2f}")
# Output: Math Formula: Salary = (21285.71 * Years) + 8285.71

*The model mathematically determined that starting base pay is $8,285, and every year of experience adds exactly $21,285 to the salary!*

6. Visualizing the Line of Best Fit

Let's draw the scatter plot of our actual data, and overlay the "Line of Best Fit" that the model generated to see how accurate it is.
python
1234567891011121314
# Plot the actual historical data points (Blue dots)
plt.scatter(X_train, y_train, color='blue', label='Actual Salaries')

# Generate the model's predictions for every point in X_train
predicted_salaries = model.predict(X_train)

# Plot the model's "Line of Best Fit" (Red line)
plt.plot(X_train, predicted_salaries, color='red', label='Regression Line')

plt.title('Salary vs Experience')
plt.xlabel('Years of Experience')
plt.ylabel('Salary ($)')
plt.legend()
plt.show()

7. Common Mistakes

  • Passing a 1D array for X: scikit-learn strictly requires X to be a 2D matrix (rows and columns), even if there is only one feature column. If you pass X = np.array([1, 2, 3]), it will crash. It must be np.array([[1], [2], [3]]) or reshaped using X.reshape(-1, 1). y can remain a 1D array.
  • Extrapolation: Our model was trained on 1 to 6 years of experience. If we ask it to predict the salary for someone with 50 years of experience, it will output $1,072,571. This is mathematical nonsense. Linear models blindly follow the straight line to infinity; they do not possess common sense. Never trust predictions that are far outside the bounds of your training data.

8. Best Practices

  • Inspect Coefficients: Always print out model.coef and model.intercept. Explaining *why* the model made a prediction (e.g., "The model adds $21k per year of experience") is critical for business stakeholders to trust your AI.

9. Exercises

  1. 1. In the equation $y = mx + b$, what attribute in scikit-learn holds the value for $m$?
  1. 2. Modify the code block above to predict the salary for someone with 8.5 years of experience.

10. MCQ Quiz with Answers

Question 1

What is the defining characteristic of a "Simple" Linear Regression model?

Question 2

When training a model with scikit-learn, which format must the input features X_train take?

11. Interview Questions

  • Q: Explain what the "Intercept" means in a Simple Linear Regression model from a business perspective.
  • Q: What is "Extrapolation" in predictive modeling, and why is it dangerous?

12. FAQs

Q: Can Linear Regression draw curved lines? A: No. Standard linear regression can only draw perfectly straight lines. If your data forms a U-curve, a straight line will result in terrible predictions (Underfitting). We will fix this in Chapter 11 with Polynomial Regression.

13. Summary

You have built your first functional machine learning model! By feeding historical data into scikit-learn, the algorithm successfully reverse-engineered the mathematical algebra (Slope and Intercept) governing the relationship between experience and salary, allowing us to predict the future.

14. Next Chapter Recommendation

Predicting a house price using *only* Square Footage is too simple. In reality, prices depend on Square Footage, Bedrooms, Age, and Zip Code simultaneously. In Chapter 7: Multiple Linear Regression, we will upgrade our algorithm to handle dozens of variables at the same time.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·