CHAPTER 09 Intermediate

Training and Evaluating Models

Updated: May 16, 2026

6 min read

# CHAPTER 9

Training and Evaluating Models

1. Introduction

You have designed the perfect neural network and compiled it. But if you call .fit() without understanding how data flows into the model, you will likely build a model that memorizes the data rather than learning from it. Deep Learning training is a delicate balancing act involving Epochs, Batch Sizes, and Validation Data. In this chapter, we will master the training loop and learn how to diagnose sick models using training histories.

2. Learning Objectives

By the end of this chapter, you will be able to:

Define Epochs, Batch Size, and Iterations.

Implement Validation Splits to monitor real-time performance.

Diagnose Overfitting and Underfitting from loss graphs.

Plot training histories using Matplotlib.

Stop training automatically using Early Stopping.

3. Epochs and Batch Size

Imagine a student reading a textbook to prepare for a final exam.

Epoch: Reading the *entire* textbook from cover to cover one time. epochs=10 means the model reads the entire dataset 10 times.

Batch Size: A neural network cannot read the entire textbook at once (your GPU doesn't have enough RAM). It reads a few pages at a time. batchsize=32 means the model looks at 32 images, calculates the error, updates its Weights, and then grabs the next 32 images.

*If you have 3,200 images and a batchsize of 32, it will take 100 "steps" (iterations) to complete 1 Epoch.*

4. The Validation Split

If a student memorizes the textbook, they will score 100% on a practice quiz, but fail the real exam. To prevent this, we hide a portion of the data during training. In Keras, you can pass a validation_split=0.2. This tells TensorFlow: "Take 20% of the training data and hide it. Do NOT use it to update Weights. At the end of every Epoch, test the model on this hidden 20% to see if it is actually learning, or just memorizing."

5. Mini Project: Train Image Classifier (with Validation)

Let's train a model and properly monitor its validation metrics.

python

1234567891011121314151617181920212223242526

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
import matplotlib.pyplot as plt

# 1. Load Data
mnist = tf.keras.datasets.mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train, X_test = X_train / 255.0, X_test / 255.0

# 2. Build and Compile
model = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(128, activation=&#039;relu'),
    Dense(10, activation=&#039;softmax')
])
model.compile(optimizer=&#039;adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# 3. Train the Model and save the History!
print("Starting training...")
# batch_size=32 is the Keras default. validation_split hides 20% of data.
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

# 4. Evaluate on the completely unseen Test Set
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f"Final Test Accuracy: {test_acc*100:.2f}%")

6. Diagnosing Overfitting and Underfitting

When you save model.fit() to a variable named history, Keras saves the accuracy and loss for every single epoch. You can plot this to diagnose your model.

python

12345678

# Plotting the Training vs Validation Loss
plt.plot(history.history[&#039;loss'], label='Training Loss')
plt.plot(history.history[&#039;val_loss'], label='Validation Loss')
plt.title(&#039;Model Loss Diagnosis')
plt.xlabel(&#039;Epochs')
plt.ylabel(&#039;Loss')
plt.legend()
plt.show()

How to Read the Graph:

Underfitting: The Training Loss and Validation Loss are both extremely high and not dropping. The model is too simple to learn the data. (Solution: Add more layers or neurons).

Overfitting: The Training Loss drops perfectly to 0, but the Validation Loss suddenly starts going UP. The model stopped learning general patterns and started memorizing the specific training images. It is failing on unseen data!

Perfect Fit: Both lines drop down smoothly and level out together.

7. Early Stopping

If your model starts overfitting at Epoch 6, but you told it to run for epochs=50, it will waste time ruining itself. We can use a Callback to stop training the exact moment Validation Loss stops improving.

python

123456

from tensorflow.keras.callbacks import EarlyStopping

# Stop if val_loss doesn't improve for 3 epochs
early_stop = EarlyStopping(monitor=&#039;val_loss', patience=3, restore_best_weights=True)

model.fit(X_train, y_train, epochs=50, validation_split=0.2, callbacks=[early_stop])

*Now, you can set epochs=1000 safely. TensorFlow will automatically stop it at the perfect moment!*

8. Common Mistakes

Setting Batch Size too high: If you set batchsize=10000, the model takes massive steps and learns very slowly, and it might crash your computer's RAM. Stick to powers of 2 (32, 64, 128).

Evaluating on the Validation set: The Validation set is used *during* training to tune hyperparameters (like early stopping). You must keep a completely separate Xtest set that the model never sees until the very final model.evaluate().

9. Best Practices

Always use Early Stopping: In professional environments, data scientists rarely guess the exact number of epochs. They set a massive number (like 500) and rely entirely on the EarlyStopping callback to save the best model.

10. Exercises

1. If you have 10,000 images and a batch size of 100, how many iterations (steps) will it take to complete 1 Epoch?

2. Write the code to import EarlyStopping and configure it to monitor valaccuracy with a patience of 5.

11. MCQ Quiz with Answers

Question 1

What happens during Overfitting?

Question 2

Which parameter in model.fit() dictates how many images the model processes before updating its Weights?

12. Interview Questions

Q: Explain the exact relationship between the Training Set, the Validation Set, and the Test Set in a deep learning workflow.

Q: How does reducing the batch size generally affect the training time and the stability of the loss curve?

13. FAQs

Q: Is 100% accuracy a good thing? A: Usually, no. If you see exactly 1.0000 training accuracy, your model has almost certainly overfitted and memorized the data. Real-world data is noisy; a healthy model usually caps out somewhere below 100%.

14. Summary

Training a model is about controlling its learning curve. By carefully managing batch sizes to protect our RAM, reserving validation data to ensure honesty, and utilizing Early Stopping callbacks to prevent overfitting, we ensure that our models are robust, generalized, and ready for real-world predictions.

15. Next Chapter Recommendation

So far, we have only stacked layers in a straight line using the Sequential API. What if we want to build a model with two different inputs (e.g., an image AND text) that merge together? In Chapter 10: Working with TensorFlow Keras API, we will unlock the true power of the Keras Functional API.

Browse All 21+ Subject Areas

Quick Links

Visual Algorithm Labs

Frontend Dev

The Future of Web Architecture in 2026

Practice Quizzes

Training and Evaluating Models #

1. Introduction #

2. Learning Objectives #

3. Epochs and Batch Size #

4. The Validation Split #

5. Mini Project: Train Image Classifier (with Validation) #

6. Diagnosing Overfitting and Underfitting #

7. Early Stopping #

8. Common Mistakes #

9. Best Practices #

10. Exercises #

11. MCQ Quiz with Answers #

What happens during Overfitting?

Which parameter in model.fit() dictates how many images the model processes before updating its Weights?

12. Interview Questions #

13. FAQs #

14. Summary #

15. Next Chapter Recommendation #

Finish this Chapter

Discussion

Explore More

📖 Related Tutorials 4

Send Feedback / Bug

Feedback Submitted!

Training and Evaluating Models

1. Introduction

2. Learning Objectives

3. Epochs and Batch Size

4. The Validation Split

5. Mini Project: Train Image Classifier (with Validation)

6. Diagnosing Overfitting and Underfitting

7. Early Stopping

8. Common Mistakes

9. Best Practices

10. Exercises

11. MCQ Quiz with Answers

Which parameter in `model.fit()` dictates how many images the model processes before updating its Weights?

12. Interview Questions

13. FAQs

14. Summary

15. Next Chapter Recommendation