CHAPTER 06
Intermediate
Simple Linear Regression
Updated: May 16, 2026
6 min read
# CHAPTER 6
Simple Linear Regression
1. Introduction
We have discussed the "Line of Best Fit" in theory. Now, it is time to build it. Simple Linear Regression is the most foundational machine learning algorithm in existence. It is called "Simple" because it uses exactly *one* independent variable (X) to predict the dependent variable (y). In this chapter, we will open up the black box ofscikit-learn, understand the high school algebra that powers it, and build a salary prediction model.
2. Learning Objectives
By the end of this chapter, you will be able to:- Understand the mathematical equation: $y = mx + b$.
- Explain the role of the Slope (Coefficient) and Intercept.
-
Train a
LinearRegressionmodel usingscikit-learn.
- Extract the mathematical formula from a trained model.
-
Visualize the regression line using
matplotlib.
3. The Math: $y = mx + b$
Simple Linear Regression relies entirely on the equation of a straight line: $$y = (m \times X) + b$$- $y$: The prediction (e.g., Estimated Salary).
- $X$: The input feature (e.g., Years of Experience).
- $m$ (Slope/Coefficient): The weight assigned to $X$. It answers: *"For every 1 year increase in experience, how much does salary go up?"*
- $b$ (Intercept): The baseline. If someone has 0 years of experience ($X=0$), what is their starting salary?
When you call model.fit(), the algorithm calculates the exact optimal values for $m$ and $b$ to minimize the error.
4. Mini Project: Salary Prediction Model
Let's build a model that predicts a software engineer's salary based on their years of experience.
python
5. Extracting the Math Formula
Let's prove that this is just algebra. We can extract the Slope ($m$) and Intercept ($b$) directly from the trained model!
python
*The model mathematically determined that starting base pay is $8,285, and every year of experience adds exactly $21,285 to the salary!*
6. Visualizing the Line of Best Fit
Let's draw the scatter plot of our actual data, and overlay the "Line of Best Fit" that the model generated to see how accurate it is.
python
7. Common Mistakes
-
Passing a 1D array for X:
scikit-learnstrictly requiresXto be a 2D matrix (rows and columns), even if there is only one feature column. If you passX = np.array([1, 2, 3]), it will crash. It must benp.array([[1], [2], [3]])or reshaped usingX.reshape(-1, 1).ycan remain a 1D array.
-
Extrapolation: Our model was trained on 1 to 6 years of experience. If we ask it to predict the salary for someone with 50 years of experience, it will output
$1,072,571. This is mathematical nonsense. Linear models blindly follow the straight line to infinity; they do not possess common sense. Never trust predictions that are far outside the bounds of your training data.
8. Best Practices
-
Inspect Coefficients: Always print out
model.coefandmodel.intercept. Explaining *why* the model made a prediction (e.g., "The model adds $21k per year of experience") is critical for business stakeholders to trust your AI.
9. Exercises
-
1.
In the equation $y = mx + b$, what attribute in
scikit-learnholds the value for $m$?
-
2.
Modify the code block above to predict the salary for someone with
8.5years of experience.
10. MCQ Quiz with Answers
Question 1
What is the defining characteristic of a "Simple" Linear Regression model?
Question 2
When training a model with scikit-learn, which format must the input features X_train take?
11. Interview Questions
- Q: Explain what the "Intercept" means in a Simple Linear Regression model from a business perspective.
- Q: What is "Extrapolation" in predictive modeling, and why is it dangerous?
12. FAQs
Q: Can Linear Regression draw curved lines? A: No. Standard linear regression can only draw perfectly straight lines. If your data forms a U-curve, a straight line will result in terrible predictions (Underfitting). We will fix this in Chapter 11 with Polynomial Regression.13. Summary
You have built your first functional machine learning model! By feeding historical data intoscikit-learn, the algorithm successfully reverse-engineered the mathematical algebra (Slope and Intercept) governing the relationship between experience and salary, allowing us to predict the future.