Skip to main content
PyTorch Essentials
CHAPTER 16 Intermediate

Saving, Loading, and Deploying PyTorch Models

Updated: May 16, 2026
6 min read

# CHAPTER 16

Saving, Loading, and Deploying PyTorch Models

1. Introduction

A deep learning model is entirely useless if it remains trapped inside a Jupyter Notebook. If you train a model for 48 hours on a cloud GPU to detect cancer in X-rays, you must be able to save that "brain" to a file, download it, and deploy it onto hospital servers or mobile apps. In this chapter, we transition from Model Training to Model Deployment, learning how to safely serialize PyTorch models and prepare them for production environments.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Save and load model weights using state_dict.
  • Save and load entire models correctly.
  • Understand the importance of model.eval() during inference.
  • Use TorchScript to decouple PyTorch from Python.
  • Understand ONNX for cross-framework deployment.
A PyTorch model is just a Python class wrapped around a massive dictionary of numbers (Weights and Biases). This dictionary is called the State Dict. The official, safest way to save a model is to *only* save this dictionary, not the class itself.
python
12345678
import torch

# Assume 'model' is a fully trained custom PyTorch model
print("Saving model weights...")

# Save only the parameters (Weights and Biases)
# .pth or .pt are the standard PyTorch file extensions
torch.save(model.state_dict(), 'my_model_weights.pth')

4. Loading the State Dict

To load the model, the software engineer writing the web backend must have a copy of the exact Python class you used to build the model.
python
12345678910111213
# 1. You MUST instantiate the exact same architecture first
loaded_model = CustomImageClassifier()

# 2. Load the dictionary of numbers from the hard drive
weights_dict = torch.load('my_model_weights.pth')

# 3. Inject the numbers into the empty architecture
loaded_model.load_state_dict(weights_dict)

# 4. CRITICAL: Set to evaluation mode before making predictions!
loaded_model.eval()

print("Model successfully loaded and ready for predictions.")
You *can* save the entire model (architecture + weights) in one file using torch.save(model, 'entiremodel.pth'). However, this is highly discouraged. It relies on Python's pickle module, which binds the file to the exact directory structure of the computer it was trained on. If you move the file to a different computer, it will frequently break. Stick to statedict!

6. Deployment Basics (REST API)

To allow a mobile app or a website to use your PyTorch model, you usually wrap it in a REST API using a Python web framework like FastAPI or Flask.
  1. 1. The web server boots up and loads your .pth file into memory.
  1. 2. A user uploads a photo on the website. The website sends an HTTP POST request to your Python API.
  1. 3. The API converts the image into a Tensor, calls loaded_model(tensor), and gets the result.
  1. 4. The API sends the result ("It's a Dog!") back to the website via JSON.

7. TorchScript: Decoupling from Python

What if you want to deploy your model inside a high-speed C++ trading application, or an iOS app, where Python does not exist? You use TorchScript. It traces your PyTorch code and compiles it into an intermediate, optimized format that can run entirely independent of Python!
python
12345678
# 1. Create a dummy input matching your model's expected shape
example_input = torch.rand(1, 3, 224, 224)

# 2. Trace the model (PyTorch watches how the data flows through the layers)
traced_script_module = torch.jit.trace(model, example_input)

# 3. Save the TorchScript model
traced_script_module.save("traced_model.pt")

*A C++ engineer can now load traced_model.pt directly into their C++ application using LibTorch!*

8. ONNX (Open Neural Network Exchange)

What if the deployment team uses TensorFlow or C#, but you trained the model in PyTorch? You export your model to ONNX. ONNX is a universal translator for AI models. An ONNX file can be loaded into almost any framework, edge device, or hardware accelerator in the world.
python
123456
torch.onnx.export(model,               # model being run
                  example_input,       # model input
                  "model.onnx",        # where to save the file
                  export_params=True,  # store the trained weights
                  input_names = ['input'],
                  output_names = ['output'])

9. Common Mistakes

  • Forgetting model.eval(): If you load a model and forget this line, the Dropout and BatchNorm layers will remain active. Your predictions will be random and erratic. This is the #1 bug in deployed AI models.
  • Preprocessing Mismatch: If your training script scales images by dividing by 255.0, your Flask web API MUST also divide incoming user images by 255.0 before passing them to the PyTorch model.

10. Best Practices

  • Checkpointing: During long training runs, save a statedict at the end of every single Epoch (e.g., modelepoch5.pth). If your computer crashes at Epoch 49, you won't lose 48 hours of work!

11. Exercises

  1. 1. Write the code to load a saved weights file named sentimentv1.pth into a newly instantiated model named productionmodel.
  1. 2. Why is saving the statedict preferred over saving the entire model?

12. MCQ Quiz with Answers

Question 1

What does model.statedict() contain?

Question 2

Which technology allows you to compile a PyTorch model so that it can be run in a C++ environment without any Python installed?

13. Interview Questions

  • Q: Explain the exact step-by-step process of loading a PyTorch model into memory for production inference, emphasizing the crucial security/evaluation steps.
  • Q: In what scenario would you export a PyTorch model to the ONNX format rather than keeping it as a .pth file?

14. Summary

Saving and loading models is the bridge between the Data Scientist and the Software Engineer. By utilizing state
dict for safe Python serialization, understanding the necessity of model.eval(), and leveraging compilation tools like TorchScript and ONNX, we ensure our PyTorch models can break free from the training environment and reach the end-user.

15. Next Chapter Recommendation

Writing the for loops for training in PyTorch is great for learning, but writing them hundreds of times for different projects gets tedious and messy. In Chapter 17: PyTorch Lightning and Training Optimization, we will learn how professionals organize and automate PyTorch code at an enterprise level.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·