CHAPTER 17
Intermediate
Model Evaluation Metrics for Regression
Updated: May 16, 2026
6 min read
# CHAPTER 17
Model Evaluation Metrics for Regression
1. Introduction
If you build a house price predictor and tell your boss, "The model looks pretty good on the graph," you will be fired. Data Science requires strict mathematical proof. How wrong is the model on average? Does it make small mistakes often, or massive mistakes rarely? To compare a Random Forest against a Linear Regression model objectively, we use Model Evaluation Metrics. In this chapter, we will learn how to officially grade our algorithms.2. Learning Objectives
By the end of this chapter, you will be able to:- Define a "Residual" (Error).
- Calculate and interpret Mean Absolute Error (MAE).
- Calculate and interpret Mean Squared Error (MSE).
- Understand why Root Mean Squared Error (RMSE) is the industry standard.
- Interpret the $R^2$ (R-Squared) Score.
- Analyze Residual plots to detect model flaws.
3. What is a Residual?
A Residual is simply the difference between the *True Value* and the *Predicted Value*.- True House Price: $300,000
- Model Prediction: $280,000
- Residual (Error): $20,000
To grade a model, we must aggregate the residuals of *every single prediction* it made on the Test Set into a single number.
4. Mean Absolute Error (MAE)
The easiest metric to understand. We calculate the absolute error of every prediction (making sure negative errors don't cancel out positive ones), add them all up, and divide by the number of predictions.Interpretation: *"On average, our model's predictions are off by $20,000."*
python
5. Mean Squared Error (MSE)
MSE squares every single error before averaging them.- *Why square them?* Squaring mathematically punishes massive mistakes heavily. If an error is 2, the square is 4. If an error is 10, the square is 100!
- *The Problem:* The resulting number is no longer in dollars; it is in "squared dollars," which makes no logical sense to a human.
python
6. Root Mean Squared Error (RMSE)
RMSE is the undisputed industry standard. We take the Mean Squared Error (which heavily punishes massive outliers) and apply a Square Root to it. This brings the final number back down into the original unit (e.g., normal Dollars) so humans can read it!
python
7. R-Squared ($R^2$) Score
MAE and RMSE measure the *error*. R-Squared measures the *goodness of fit*. It represents the percentage of variance in the target variable that is explained by the input features.- $R^2 = 1.0$ (100%): A perfect model. It correctly predicted every single dot.
- $R^2 = 0.0$ (0%): A terrible model. It is no better than just guessing the average price every time.
- $R^2 < 0.0$ (Negative): A catastrophic model. It is mathematically worse than just guessing the average!
python
8. Mini Project: Compare Two Models
Let's train a Linear Regression model and a Decision Tree, and objectively prove which one is better using metrics.
python
9. Common Mistakes
-
Evaluating on the Training Data: If you calculate RMSE by asking the model to predict the data it was trained on, it will achieve a near-perfect score. This is cheating. You MUST calculate metrics on a separate
ytestholdout set that the model has never seen.
- Only looking at $R^2$: A model can have a great $R^2$ score but still be off by $50,000 on every prediction. You must always report both $R^2$ (the fit) and RMSE (the actual financial/business error).
10. Best Practices
-
Residual Analysis: Calculate
residuals = ytrue - predictionsand plot them on a scatter plot. If the dots are a random cloud around the 0-line, your model is healthy. If the dots form a U-shape, your model missed a massive non-linear pattern!
11. Exercises
- 1. If your model's MAE is $5,000 and its RMSE is $45,000, what does this tell you about the types of mistakes the model is making?
-
2.
What does an R-Squared score of
-0.5mean?
12. MCQ Quiz with Answers
Question 1
Why is Root Mean Squared Error (RMSE) generally preferred over Mean Squared Error (MSE) when reporting results to stakeholders?
Question 2
Which metric represents the percentage of variance explained by the model, with 1.0 indicating perfect accuracy?
13. Interview Questions
- Q: Explain the mathematical and practical difference between Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). In what scenario would one be vastly different from the other?
- Q: Describe how you would use a Residual Plot to detect Heteroscedasticity in a trained model.
14. FAQs
Q: What is a "good" RMSE score? A: It depends entirely on the context! An RMSE of20 is terrible if you are predicting human body temperature in Celsius. An RMSE of 20,000 is incredible if you are predicting the price of a multi-million dollar mansion.
15. Summary
You can now scientifically prove the accuracy of your algorithms. By utilizing MAE for straightforward interpretation, RMSE to brutally penalize large outliers, and the R-Squared score to grade the overall goodness of fit, you provide business stakeholders with the exact mathematical confidence they need to deploy your model.16. Next Chapter Recommendation
We know how to grade our models, but what if the grade is bad? How do we find the perfectmax_depth for a Tree, or the perfect alpha for Ridge Regression? We don't guess. In Chapter 18: Hyperparameter Tuning and Cross Validation, we will force the computer to test 1,000 combinations automatically to find the ultimate setup.