CHAPTER 06 Intermediate

Simple Linear Regression

Updated: May 16, 2026

6 min read

# CHAPTER 6

Simple Linear Regression

1. Introduction

We have discussed the "Line of Best Fit" in theory. Now, it is time to build it. Simple Linear Regression is the most foundational machine learning algorithm in existence. It is called "Simple" because it uses exactly *one* independent variable (X) to predict the dependent variable (y). In this chapter, we will open up the black box of scikit-learn, understand the high school algebra that powers it, and build a salary prediction model.

2. Learning Objectives

By the end of this chapter, you will be able to:

Understand the mathematical equation: $y = mx + b$.

Explain the role of the Slope (Coefficient) and Intercept.

Train a LinearRegression model using scikit-learn.

Extract the mathematical formula from a trained model.

Visualize the regression line using matplotlib.

3. The Math: $y = mx + b$

Simple Linear Regression relies entirely on the equation of a straight line: $$y = (m \times X) + b$$

$y$: The prediction (e.g., Estimated Salary).

$X$: The input feature (e.g., Years of Experience).

$m$ (Slope/Coefficient): The weight assigned to $X$. It answers: *"For every 1 year increase in experience, how much does salary go up?"*

$b$ (Intercept): The baseline. If someone has 0 years of experience ($X=0$), what is their starting salary?

When you call model.fit(), the algorithm calculates the exact optimal values for $m$ and $b$ to minimize the error.

4. Mini Project: Salary Prediction Model

Let's build a model that predicts a software engineer's salary based on their years of experience.

python

123456789101112131415161718192021

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# 1. Provide the Data
# X must be a 2D array in scikit-learn (hence the double brackets)
X_train = np.array([[1], [2], [3], [4], [5], [6]]) # Years of Experience
y_train = np.array([45000, 50000, 60000, 80000, 110000, 150000]) # Salary in $

# 2. Initialize the Model
model = LinearRegression()

# 3. Train the Model (Find the optimal 'm' and 'b')
model.fit(X_train, y_train)

# 4. Make a Prediction!
years = 7
X_test = np.array([[years]])
prediction = model.predict(X_test)
print(f"Predicted salary for {years} years: ${prediction[0]:.2f}")
# Output: Predicted salary for 7 years: $157333.33

5. Extracting the Math Formula

Let's prove that this is just algebra. We can extract the Slope ($m$) and Intercept ($b$) directly from the trained model!

python

12345678

# Extract 'm' (Slope/Coefficient)
slope = model.coef_[0]

# Extract 'b' (Y-Intercept)
intercept = model.intercept_

print(f"Math Formula: Salary = ({slope:.2f} * Years) + {intercept:.2f}")
# Output: Math Formula: Salary = (21285.71 * Years) + 8285.71

*The model mathematically determined that starting base pay is $8,285, and every year of experience adds exactly $21,285 to the salary!*

6. Visualizing the Line of Best Fit

Let's draw the scatter plot of our actual data, and overlay the "Line of Best Fit" that the model generated to see how accurate it is.

python

1234567891011121314

# Plot the actual historical data points (Blue dots)
plt.scatter(X_train, y_train, color=&#039;blue', label='Actual Salaries')

# Generate the model's predictions for every point in X_train
predicted_salaries = model.predict(X_train)

# Plot the model's "Line of Best Fit" (Red line)
plt.plot(X_train, predicted_salaries, color=&#039;red', label='Regression Line')

plt.title(&#039;Salary vs Experience')
plt.xlabel(&#039;Years of Experience')
plt.ylabel(&#039;Salary ($)')
plt.legend()
plt.show()

7. Common Mistakes

Passing a 1D array for X: scikit-learn strictly requires X to be a 2D matrix (rows and columns), even if there is only one feature column. If you pass X = np.array([1, 2, 3]), it will crash. It must be np.array([[1], [2], [3]]) or reshaped using X.reshape(-1, 1). y can remain a 1D array.

Extrapolation: Our model was trained on 1 to 6 years of experience. If we ask it to predict the salary for someone with 50 years of experience, it will output $1,072,571. This is mathematical nonsense. Linear models blindly follow the straight line to infinity; they do not possess common sense. Never trust predictions that are far outside the bounds of your training data.

8. Best Practices

Inspect Coefficients: Always print out model.coef and model.intercept. Explaining *why* the model made a prediction (e.g., "The model adds $21k per year of experience") is critical for business stakeholders to trust your AI.

9. Exercises

1. In the equation $y = mx + b$, what attribute in scikit-learn holds the value for $m$?

2. Modify the code block above to predict the salary for someone with 8.5 years of experience.

10. MCQ Quiz with Answers

Question 1

What is the defining characteristic of a "Simple" Linear Regression model?

Question 2

When training a model with `scikit-learn`, which format must the input features `X_train` take?

11. Interview Questions

Q: Explain what the "Intercept" means in a Simple Linear Regression model from a business perspective.

Q: What is "Extrapolation" in predictive modeling, and why is it dangerous?

12. FAQs

Q: Can Linear Regression draw curved lines? A: No. Standard linear regression can only draw perfectly straight lines. If your data forms a U-curve, a straight line will result in terrible predictions (Underfitting). We will fix this in Chapter 11 with Polynomial Regression.

13. Summary

You have built your first functional machine learning model! By feeding historical data into scikit-learn, the algorithm successfully reverse-engineered the mathematical algebra (Slope and Intercept) governing the relationship between experience and salary, allowing us to predict the future.

14. Next Chapter Recommendation

Predicting a house price using *only* Square Footage is too simple. In reality, prices depend on Square Footage, Bedrooms, Age, and Zip Code simultaneously. In Chapter 7: Multiple Linear Regression, we will upgrade our algorithm to handle dozens of variables at the same time.

Featured

Browse All 21+ Subject Areas

Popular Topics

More Topics

Quick Links

Featured

Visual Algorithm Labs

Sorting Algorithms

Data Structures

Featured

Frontend Dev

Career Paths

Skill Tracks

Featured

The Future of Web Architecture in 2026

Categories

Community

Practice Quizzes

Simple Linear Regression

Simple Linear Regression

1. Introduction

2. Learning Objectives

3. The Math: $y = mx + b$

4. Mini Project: Salary Prediction Model

5. Extracting the Math Formula

6. Visualizing the Line of Best Fit

7. Common Mistakes

8. Best Practices

9. Exercises

10. MCQ Quiz with Answers

What is the defining characteristic of a "Simple" Linear Regression model?

When training a model with `scikit-learn`, which format must the input features `X_train` take?

11. Interview Questions

12. FAQs

13. Summary

14. Next Chapter Recommendation

Finish this Chapter

Discussion

Send Feedback / Bug

Feedback Submitted!

Browse All 21+ Subject Areas

Quick Links

Visual Algorithm Labs

Frontend Dev

The Future of Web Architecture in 2026

Practice Quizzes

Simple Linear Regression #

1. Introduction #

2. Learning Objectives #

3. The Math: $y = mx + b$ #

4. Mini Project: Salary Prediction Model #

5. Extracting the Math Formula #

6. Visualizing the Line of Best Fit #

7. Common Mistakes #

8. Best Practices #

9. Exercises #

10. MCQ Quiz with Answers #

What is the defining characteristic of a "Simple" Linear Regression model?

When training a model with scikit-learn, which format must the input features X_train take?

11. Interview Questions #

12. FAQs #

13. Summary #

14. Next Chapter Recommendation #

Finish this Chapter

Discussion

Explore More

📖 Related Tutorials 4

🧪 Related Labs 1

Send Feedback / Bug

Feedback Submitted!

Simple Linear Regression

1. Introduction

2. Learning Objectives

3. The Math: $y = mx + b$

4. Mini Project: Salary Prediction Model

5. Extracting the Math Formula

6. Visualizing the Line of Best Fit

7. Common Mistakes

8. Best Practices

9. Exercises

10. MCQ Quiz with Answers

When training a model with `scikit-learn`, which format must the input features `X_train` take?

11. Interview Questions

12. FAQs

13. Summary

14. Next Chapter Recommendation