CHAPTER 17 Intermediate

Model Evaluation Metrics for Regression

Updated: May 16, 2026

6 min read

# CHAPTER 17

Model Evaluation Metrics for Regression

1. Introduction

If you build a house price predictor and tell your boss, "The model looks pretty good on the graph," you will be fired. Data Science requires strict mathematical proof. How wrong is the model on average? Does it make small mistakes often, or massive mistakes rarely? To compare a Random Forest against a Linear Regression model objectively, we use Model Evaluation Metrics. In this chapter, we will learn how to officially grade our algorithms.

2. Learning Objectives

By the end of this chapter, you will be able to:

Define a "Residual" (Error).

Calculate and interpret Mean Absolute Error (MAE).

Calculate and interpret Mean Squared Error (MSE).

Understand why Root Mean Squared Error (RMSE) is the industry standard.

Interpret the $R^2$ (R-Squared) Score.

Analyze Residual plots to detect model flaws.

3. What is a Residual?

A Residual is simply the difference between the *True Value* and the *Predicted Value*.

True House Price: $300,000

Model Prediction: $280,000

Residual (Error): $20,000

To grade a model, we must aggregate the residuals of *every single prediction* it made on the Test Set into a single number.

4. Mean Absolute Error (MAE)

The easiest metric to understand. We calculate the absolute error of every prediction (making sure negative errors don't cancel out positive ones), add them all up, and divide by the number of predictions.

Interpretation: *"On average, our model's predictions are off by $20,000."*

python

12345

from sklearn.metrics import mean_absolute_error

# Assuming y_test are the real answers, and predictions are the model's guesses
# mae = mean_absolute_error(y_test, predictions)
# print(f"MAE: ${mae:.2f}")

5. Mean Squared Error (MSE)

MSE squares every single error before averaging them.

*Why square them?* Squaring mathematically punishes massive mistakes heavily. If an error is 2, the square is 4. If an error is 10, the square is 100!

*The Problem:* The resulting number is no longer in dollars; it is in "squared dollars," which makes no logical sense to a human.

python

123

from sklearn.metrics import mean_squared_error

# mse = mean_squared_error(y_test, predictions)

6. Root Mean Squared Error (RMSE)

RMSE is the undisputed industry standard. We take the Mean Squared Error (which heavily punishes massive outliers) and apply a Square Root to it. This brings the final number back down into the original unit (e.g., normal Dollars) so humans can read it!

python

1234

import numpy as np

# rmse = np.sqrt(mean_squared_error(y_test, predictions))
# print(f"RMSE: ${rmse:.2f}")

7. R-Squared ($R^2$) Score

MAE and RMSE measure the *error*. R-Squared measures the *goodness of fit*. It represents the percentage of variance in the target variable that is explained by the input features.

$R^2 = 1.0$ (100%): A perfect model. It correctly predicted every single dot.

$R^2 = 0.0$ (0%): A terrible model. It is no better than just guessing the average price every time.

$R^2 < 0.0$ (Negative): A catastrophic model. It is mathematically worse than just guessing the average!

python

1234

from sklearn.metrics import r2_score

# r2 = r2_score(y_test, predictions)
# print(f"R-Squared Score: {r2 * 100:.1f}%")

8. Mini Project: Compare Two Models

Let's train a Linear Regression model and a Decision Tree, and objectively prove which one is better using metrics.

python

1234567891011121314151617181920212223242526

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Mock Test Data
y_true = np.array([100, 150, 200, 250, 300])

# Model 1 Predictions (Linear)
preds_linear = np.array([110, 140, 210, 240, 320])

# Model 2 Predictions (Tree)
preds_tree = np.array([105, 145, 195, 255, 305])

# Evaluate Linear Model
rmse_linear = np.sqrt(mean_squared_error(y_true, preds_linear))
r2_linear = r2_score(y_true, preds_linear)

# Evaluate Tree Model
rmse_tree = np.sqrt(mean_squared_error(y_true, preds_tree))
r2_tree = r2_score(y_true, preds_tree)

print(f"Linear -> RMSE: {rmse_linear:.2f} | R2: {r2_linear:.4f}")
print(f"Tree   -> RMSE: {rmse_tree:.2f} | R2: {r2_tree:.4f}")

# The Tree model will have a lower RMSE and higher R2. It wins!

9. Common Mistakes

Evaluating on the Training Data: If you calculate RMSE by asking the model to predict the data it was trained on, it will achieve a near-perfect score. This is cheating. You MUST calculate metrics on a separate ytest holdout set that the model has never seen.

Only looking at $R^2$: A model can have a great $R^2$ score but still be off by $50,000 on every prediction. You must always report both $R^2$ (the fit) and RMSE (the actual financial/business error).

10. Best Practices

Residual Analysis: Calculate residuals = ytrue - predictions and plot them on a scatter plot. If the dots are a random cloud around the 0-line, your model is healthy. If the dots form a U-shape, your model missed a massive non-linear pattern!

11. Exercises

1. If your model's MAE is $5,000 and its RMSE is $45,000, what does this tell you about the types of mistakes the model is making?

2. What does an R-Squared score of -0.5 mean?

12. MCQ Quiz with Answers

Question 1

Why is Root Mean Squared Error (RMSE) generally preferred over Mean Squared Error (MSE) when reporting results to stakeholders?

Question 2

Which metric represents the percentage of variance explained by the model, with `1.0` indicating perfect accuracy?

13. Interview Questions

Q: Explain the mathematical and practical difference between Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). In what scenario would one be vastly different from the other?

Q: Describe how you would use a Residual Plot to detect Heteroscedasticity in a trained model.

14. FAQs

Q: What is a "good" RMSE score? A: It depends entirely on the context! An RMSE of 20 is terrible if you are predicting human body temperature in Celsius. An RMSE of 20,000 is incredible if you are predicting the price of a multi-million dollar mansion.

15. Summary

You can now scientifically prove the accuracy of your algorithms. By utilizing MAE for straightforward interpretation, RMSE to brutally penalize large outliers, and the R-Squared score to grade the overall goodness of fit, you provide business stakeholders with the exact mathematical confidence they need to deploy your model.

16. Next Chapter Recommendation

We know how to grade our models, but what if the grade is bad? How do we find the perfect max_depth for a Tree, or the perfect alpha for Ridge Regression? We don't guess. In Chapter 18: Hyperparameter Tuning and Cross Validation, we will force the computer to test 1,000 combinations automatically to find the ultimate setup.

Browse All 21+ Subject Areas

Quick Links

Visual Algorithm Labs

Frontend Dev

The Future of Web Architecture in 2026

Practice Quizzes

Model Evaluation Metrics for Regression #

1. Introduction #

2. Learning Objectives #

3. What is a Residual? #

4. Mean Absolute Error (MAE) #

5. Mean Squared Error (MSE) #

6. Root Mean Squared Error (RMSE) #

7. R-Squared ($R^2$) Score #

8. Mini Project: Compare Two Models #

9. Common Mistakes #

10. Best Practices #

11. Exercises #

12. MCQ Quiz with Answers #

Why is Root Mean Squared Error (RMSE) generally preferred over Mean Squared Error (MSE) when reporting results to stakeholders?

Which metric represents the percentage of variance explained by the model, with 1.0 indicating perfect accuracy?

13. Interview Questions #

14. FAQs #

15. Summary #

16. Next Chapter Recommendation #

Finish this Chapter

Discussion

Explore More

📖 Related Tutorials 4

🧪 Related Labs 1

Send Feedback / Bug

Feedback Submitted!

Model Evaluation Metrics for Regression

1. Introduction

2. Learning Objectives

3. What is a Residual?

4. Mean Absolute Error (MAE)

5. Mean Squared Error (MSE)

6. Root Mean Squared Error (RMSE)

7. R-Squared ($R^2$) Score

8. Mini Project: Compare Two Models

9. Common Mistakes

10. Best Practices

11. Exercises

12. MCQ Quiz with Answers

Which metric represents the percentage of variance explained by the model, with `1.0` indicating perfect accuracy?

13. Interview Questions

14. FAQs

15. Summary

16. Next Chapter Recommendation