Skip to main content
PyTorch Essentials
CHAPTER 09 Intermediate

Training and Evaluating Models in PyTorch

Updated: May 16, 2026
6 min read

# CHAPTER 9

Training and Evaluating Models in PyTorch

1. Introduction

In Keras/TensorFlow, training a model is a single line of code: model.fit(). In PyTorch, there is no fit() function. You must write the entire Training Loop from scratch using standard Python for loops. While this seems daunting at first, it is the primary reason researchers love PyTorch: you have absolute, line-by-line control over exactly how the AI learns. In this chapter, we will write the 5-step PyTorch Training Loop.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Define an Optimizer (like Adam or SGD).
  • Write a standard PyTorch Training Loop.
  • Understand the 5 critical steps of Backpropagation in code.
  • Write an Evaluation Loop to test the model on unseen data.
  • Manage memory using model.train() and model.eval().

3. The Optimizer

Before looping, we must define the Optimizer. The Optimizer is the engine that looks at the gradients calculated by Autograd and physically updates the Weights of the network.
python
1234567891011
import torch
import torch.nn as nn
import torch.optim as optim

# Assume `model` is a previously defined nn.Module
# We pass the model's parameters (weights) to the optimizer so it knows what to update
# lr = Learning Rate (the size of the step it takes to fix the error)
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Define the Loss Function (Criterion)
criterion = nn.CrossEntropyLoss()

4. The 5-Step Training Loop

Every PyTorch training loop follows the exact same 5 steps for every batch of data. Memorize this sequence.
python
123456789101112131415161718192021222324252627
# 1. Set the model to Training Mode
# (This activates Dropout layers and BatchNorm layers)
model.train()

epochs = 10

for epoch in range(epochs):
    # Assume X_train and y_train are our Tensors
    
    # STEP 1: Forward Pass (Make a guess)
    predictions = model(X_train)
    
    # STEP 2: Calculate the Loss (How wrong was the guess?)
    loss = criterion(predictions, y_train)
    
    # STEP 3: Zero the Gradients
    # PyTorch accumulates gradients by default. We must clear them from the last loop!
    optimizer.zero_grad()
    
    # STEP 4: Backward Pass (Autograd calculates the calculus derivatives)
    loss.backward()
    
    # STEP 5: Optimizer Step (Update the weights based on the derivatives)
    optimizer.step()
    
    if (epoch+1) % 2 == 0:
        print(f"Epoch: {epoch+1} | Loss: {loss.item():.4f}")

5. The Evaluation Loop (Testing)

If a student memorizes a textbook, they will score 100% on a practice quiz, but fail the real exam. To prevent this, we test the model on a "Test Set" of data it has never seen before.

During Evaluation, we do NOT want the model to learn. We do not want to calculate gradients or update weights.

python
1234567891011121314151617181920
# 1. Set the model to Evaluation Mode
# (This turns OFF Dropout layers so the model uses 100% of its brain)
model.eval()

# 2. Turn off Autograd Engine to save massive amounts of RAM and speed up testing
with torch.no_grad():
    # Forward pass on the unseen test data
    test_predictions = model(X_test)
    test_loss = criterion(test_predictions, y_test)
    
    # Calculate Accuracy (Assuming classification)
    # Get the index of the highest probability
    _, predicted_classes = torch.max(test_predictions, dim=1)
    
    # Count how many predictions match the true labels
    correct = (predicted_classes == y_test).sum().item()
    total = y_test.size(0)
    accuracy = (correct / total) * 100
    
    print(f"Test Loss: {test_loss.item():.4f} | Test Accuracy: {accuracy:.2f}%")

6. Mini Project: Putting it Together

A professional workflow runs a Training phase and an Evaluation phase *inside* the same Epoch loop, tracking the val_loss continuously to ensure the model isn't overfitting.
python
12345678910111213141516171819202122
# The Ultimate Training Loop Structure
epochs = 50

for epoch in range(epochs):
    
    ### TRAINING PHASE ###
    model.train()
    optimizer.zero_grad()
    train_preds = model(X_train)
    train_loss = criterion(train_preds, y_train)
    train_loss.backward()
    optimizer.step()
    
    ### EVALUATION PHASE ###
    model.eval()
    with torch.no_grad():
        test_preds = model(X_test)
        test_loss = criterion(test_preds, y_test)
        
    # Print progress
    if epoch % 10 == 0:
        print(f"Epoch {epoch} | Train Loss: {train_loss.item():.4f} | Test Loss: {test_loss.item():.4f}")

7. Common Mistakes

  • Forgetting optimizer.zerograd(): If you forget this, PyTorch will add the gradients from Epoch 1, Epoch 2, and Epoch 3 together. By Epoch 5, your gradients will be massive, and your model's loss will explode to NaN (Not a Number).
  • Evaluating without with torch.nograd():: The model will still output the correct prediction, but PyTorch will quietly store the massive computational graph in your GPU RAM. If you do this for a large test set, your computer will crash with an "Out of Memory" (OOM) error.

8. Best Practices

  • Use model.train() and model.eval() always: Even if your simple model doesn't currently use Dropout or BatchNorm layers, write these functions anyway. It builds muscle memory for when you start building complex architectures where forgetting them will silently ruin your accuracy.

9. Exercises

  1. 1. Write the 5 steps of the PyTorch training loop in order.
  1. 2. What does loss.item() do, and why do we use it in our print statements instead of just printing loss?

10. MCQ Quiz with Answers

Question 1

What is the purpose of torch.nograd() during the evaluation loop?

Question 2

Which step physically updates the weights and biases inside the neural network?

11. Interview Questions

  • Q: Explain the relationship between loss.backward() and optimizer.step() in PyTorch. What exactly is each function doing mathematically?
  • Q: If you notice your training loss is fluctuating wildly up and down instead of smoothly decreasing, what Hyperparameter in the Optimizer should you likely adjust? (Answer: The Learning Rate is too high).

12. FAQs

Q: My training loop is incredibly slow. What's wrong? A: In our examples, we passed the *entire* Xtrain dataset into the model at once. If Xtrain is 50,000 images, your computer will choke. In the real world, we use "Batches." We cover this in the next chapter!

13. Summary

The PyTorch Training Loop is the beating heart of Deep Learning. By mastering the 5-step sequence—Forward Pass, Loss Calculation, Zeroing Gradients, Backpropagation, and Optimizer Step—you have unlocked the ability to train any neural network. Furthermore, by strictly separating Training and Evaluation modes, you ensure your model is robust and your computer's RAM is protected.

14. Next Chapter Recommendation

Passing the entire dataset into the model at once is mathematically sound, but practically impossible for large datasets. We need a way to spoon-feed data to the GPU in small "batches." In Chapter 10: PyTorch Datasets and DataLoaders, we will learn how to build enterprise-grade data pipelines.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·