CHAPTER 11
Intermediate
Image Classification with CNNs
Updated: May 16, 2026
6 min read
# CHAPTER 11
Image Classification with CNNs
1. Introduction
If you feed a high-resolution, 4K image into a standardDense neural network, the network will try to create a Weight for every single pixel. This results in billions of parameters. The network will run out of memory and instantly overfit. Furthermore, Dense layers don't understand spatial relationships (e.g., that an eye is usually above a nose). To solve Computer Vision, researchers invented the Convolutional Neural Network (CNN). In this chapter, we will build the architecture that allows cars to see and doctors to detect tumors.
2. Learning Objectives
By the end of this chapter, you will be able to:- Explain why Dense networks fail on complex images.
- Understand the Convolution operation (Filters).
- Understand MaxPooling and feature compression.
-
Build a CNN architecture using
Conv2DandMaxPooling2D.
- Train a model to classify Cats vs. Dogs.
3. How Convolutions Work
Instead of looking at the entire image at once, a CNN uses a Filter (a small 3x3 pixel square).- 1. The 3x3 filter slides (convolves) across the image, pixel by pixel, scanning it like a flashlight.
- 2. The filter is mathematically designed to detect a specific feature (like a horizontal edge, a vertical edge, or a curve).
- 3. If it finds the feature, it "lights up."
- 4. The CNN applies dozens of these filters simultaneously. The first layer might detect simple edges. Deeper layers combine those edges to detect shapes (circles). The deepest layers combine shapes to detect complex objects (a dog's ear).
4. MaxPooling (Compression)
After a Convolution layer finds features, the image is still massive. We use MaxPooling to shrink it. A MaxPooling layer looks at a 2x2 grid of pixels and simply keeps the maximum (brightest) value, discarding the other three. This effectively cuts the image size in half, keeping only the most important features and drastically reducing computational requirements.5. Standard CNN Architecture
A CNN almost always follows this pattern:-
1.
Conv2D->MaxPooling2D(Extract low-level features, compress)
-
2.
Conv2D->MaxPooling2D(Extract mid-level features, compress)
-
3.
Flatten(Convert the 2D feature maps into a 1D line)
-
4.
Dense(Standard network to make the final prediction based on the features)
6. Mini Project: Cat vs Dog Classifier
Let's build a CNN to classify color images of Cats and Dogs (150x150 pixels, 3 color channels: Red, Green, Blue).
python
7. Handling Image Datasets (ImageDataGenerator)
You cannot load 10,000 high-res images into NumPy arrays; it will crash your RAM. Keras providesImageDataGenerator to load images from folders on your hard drive in small batches.
python
8. Common Mistakes
-
Forgetting
Flatten(): If you try to connect aConv2Dlayer directly to aDenselayer withoutFlatten(), TensorFlow will crash. Dense layers only accept 1D flat arrays, not 2D or 3D feature maps.
- Using too many filters early: Don't start with 512 filters in the first layer. Start small (32), and double it (64, 128) as the image becomes smaller through pooling.
9. Best Practices
-
Data Augmentation: Neural networks need massive amounts of data. If you only have 1,000 pictures of cats, you can use
ImageDataGeneratorto artificially flip, rotate, and zoom the images during training, effectively turning 1,000 images into 10,000 unique images! This prevents overfitting.
10. Exercises
-
1.
Look at the
Conv2Dlayer. What does the parameter(3, 3)represent?
-
2.
If an image enters a
MaxPooling2D(2, 2)layer with a size of 100x100 pixels, what size will it be when it exits the layer?
11. MCQ Quiz with Answers
Question 1
Why are CNNs superior to standard Dense networks for processing images?
Question 2
What is the primary purpose of a MaxPooling layer?
12. Interview Questions
- Q: Explain the mathematical operation of Convolution in the context of image processing.
- Q: Describe a standard CNN architecture flow from the Input layer to the Output layer.
13. FAQs
Q: Can I use CNNs for things other than images? A: Yes! 1D Convolutions (Conv1D) are highly effective at processing audio wave signals and even certain types of sequential text and time-series data.