Regression Algorithms
# CHAPTER 23
Regression Algorithms
1. Chapter Introduction
We are ready to predict the future. The most fundamental task in Supervised Machine Learning is Regression—predicting a continuous number. How much will this house sell for? What will our revenue be next month? This chapter covers Linear Regression, how to train it using Scikit-Learn, and how to evaluate if its predictions are actually accurate.2. What is Linear Regression?
Linear Regression attempts to draw a straight "line of best fit" through your data points.
If you plot Square Footage on the X-axis and House Price on the Y-axis, the algorithm finds the perfect line that minimizes the distance between the line and every single data point. Once the line is drawn, you can use it to predict the price of a house size that isn't even in your dataset.
3. Training a Linear Regression Model
Let's assume our data is already preprocessed (Cleaned, Encoded, Split, and Scaled) from Chapter 22.
4. Making Predictions
Now that the model has learned the rules, we test it. We give it the X_test data (which it has never seen before) and ask it to guess the house prices.
5. Evaluating Regression Models (Metrics)
How do we know if the model is good? We calculate the error between the predictions and the real y_test answers.
1. Mean Absolute Error (MAE): The average amount the model was wrong by, in actual dollars. 2. R-Squared (R²): A score from 0 to 1. An R² of 0.85 means the model explains 85% of the variance in house prices. (1.0 is perfect).
6. Polynomial Regression (Non-Linear Data)
What if the relationship isn't a straight line? What if it curves? Linear Regression will fail. We must use Polynomial Regression, which bends the line of best fit.
In Scikit-Learn, we do this by transforming the features *before* feeding them to Linear Regression.
7. Mini Project: House Price Predictor
8. Common Mistakes
-
Evaluating on Training Data: If you calculate your Error metrics using
model.predict(Xtrain), you will get an amazing score. This is a lie. The model has already memorized that data. You MUST evaluate usingmodel.predict(Xtest).
- Ignoring the MAE scale: An MAE of 5,000 is terrible if you are predicting the price of a $20 book. An MAE of 5,000 is incredible if you are predicting the price of a $1,000,000 house. Context matters.
9. MCQs
What is the goal of Regression in Machine Learning?
Which Scikit-Learn method asks the model to learn from the data?
Which Scikit-Learn method asks the trained model to make a guess on unseen data?
What does Mean Absolute Error (MAE) represent?
What is the best possible R-Squared (R²) score?
If your data has a curved relationship (like exponential growth), what should you use?
To evaluate the true accuracy of your model, which data should you use?
What attribute holds the learned rules/weights of a Linear Regression model?
Is Linear Regression an example of Supervised or Unsupervised learning?
10. Interview Questions
- Q: Explain how R-Squared and Mean Absolute Error differ. Which one is easier to explain to a non-technical business manager?
- Q: You train a Linear Regression model, but the R-Squared is very low (0.3). The data looks curved on a scatter plot. How do you fix this pipeline?
11. Summary
Regression predicts numbers. The workflow is universal: initializeLinearRegression(), teach it the rules using .fit(Xtrain, ytrain), and ask it to guess the test data using .predict(Xtest). Finally, calculate the meanabsoluteerror to see exactly how many dollars (or units) your model's predictions are off by on average.