Skip to main content
TensorFlow Introduction
CHAPTER 11 Intermediate

Image Classification with CNNs

Updated: May 16, 2026
6 min read

# CHAPTER 11

Image Classification with CNNs

1. Introduction

If you feed a high-resolution, 4K image into a standard Dense neural network, the network will try to create a Weight for every single pixel. This results in billions of parameters. The network will run out of memory and instantly overfit. Furthermore, Dense layers don't understand spatial relationships (e.g., that an eye is usually above a nose). To solve Computer Vision, researchers invented the Convolutional Neural Network (CNN). In this chapter, we will build the architecture that allows cars to see and doctors to detect tumors.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Explain why Dense networks fail on complex images.
  • Understand the Convolution operation (Filters).
  • Understand MaxPooling and feature compression.
  • Build a CNN architecture using Conv2D and MaxPooling2D.
  • Train a model to classify Cats vs. Dogs.

3. How Convolutions Work

Instead of looking at the entire image at once, a CNN uses a Filter (a small 3x3 pixel square).
  1. 1. The 3x3 filter slides (convolves) across the image, pixel by pixel, scanning it like a flashlight.
  1. 2. The filter is mathematically designed to detect a specific feature (like a horizontal edge, a vertical edge, or a curve).
  1. 3. If it finds the feature, it "lights up."
  1. 4. The CNN applies dozens of these filters simultaneously. The first layer might detect simple edges. Deeper layers combine those edges to detect shapes (circles). The deepest layers combine shapes to detect complex objects (a dog's ear).

4. MaxPooling (Compression)

After a Convolution layer finds features, the image is still massive. We use MaxPooling to shrink it. A MaxPooling layer looks at a 2x2 grid of pixels and simply keeps the maximum (brightest) value, discarding the other three. This effectively cuts the image size in half, keeping only the most important features and drastically reducing computational requirements.

5. Standard CNN Architecture

A CNN almost always follows this pattern:
  1. 1. Conv2D -> MaxPooling2D (Extract low-level features, compress)
  1. 2. Conv2D -> MaxPooling2D (Extract mid-level features, compress)
  1. 3. Flatten (Convert the 2D feature maps into a 1D line)
  1. 4. Dense (Standard network to make the final prediction based on the features)

6. Mini Project: Cat vs Dog Classifier

Let's build a CNN to classify color images of Cats and Dogs (150x150 pixels, 3 color channels: Red, Green, Blue).
python
1234567891011121314151617181920212223242526272829303132333435
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

# 1. Build the CNN
model = Sequential([
    # Layer 1: 32 filters, each 3x3 in size. 
    # Input shape: 150x150 pixels, 3 color channels (RGB)
    Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)),
    MaxPooling2D(2, 2), # Shrinks image by half
    
    # Layer 2: 64 filters (We increase filters as the image gets smaller)
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D(2, 2),
    
    # Layer 3: 128 filters
    Conv2D(128, (3, 3), activation='relu'),
    MaxPooling2D(2, 2),
    
    # Transition to Dense layers
    Flatten(),
    
    # Fully Connected Brain
    Dense(512, activation='relu'),
    
    # Output Layer (1 neuron, Sigmoid for Binary Classification: Cat or Dog)
    Dense(1, activation='sigmoid')
])

# 2. Compile the Model
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model.summary()

7. Handling Image Datasets (ImageDataGenerator)

You cannot load 10,000 high-res images into NumPy arrays; it will crash your RAM. Keras provides ImageDataGenerator to load images from folders on your hard drive in small batches.
python
1234567891011121314
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Rescale images from 0-255 to 0-1
train_datagen = ImageDataGenerator(rescale=1./255)

# Load images from a directory (automatically labels them based on folder names!)
train_generator = train_datagen.flow_from_directory(
        'dataset/training_set',
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')

# Train using the generator
# model.fit(train_generator, epochs=10)

8. Common Mistakes

  • Forgetting Flatten(): If you try to connect a Conv2D layer directly to a Dense layer without Flatten(), TensorFlow will crash. Dense layers only accept 1D flat arrays, not 2D or 3D feature maps.
  • Using too many filters early: Don't start with 512 filters in the first layer. Start small (32), and double it (64, 128) as the image becomes smaller through pooling.

9. Best Practices

  • Data Augmentation: Neural networks need massive amounts of data. If you only have 1,000 pictures of cats, you can use ImageDataGenerator to artificially flip, rotate, and zoom the images during training, effectively turning 1,000 images into 10,000 unique images! This prevents overfitting.

10. Exercises

  1. 1. Look at the Conv2D layer. What does the parameter (3, 3) represent?
  1. 2. If an image enters a MaxPooling2D(2, 2) layer with a size of 100x100 pixels, what size will it be when it exits the layer?

11. MCQ Quiz with Answers

Question 1

Why are CNNs superior to standard Dense networks for processing images?

Question 2

What is the primary purpose of a MaxPooling layer?

12. Interview Questions

  • Q: Explain the mathematical operation of Convolution in the context of image processing.
  • Q: Describe a standard CNN architecture flow from the Input layer to the Output layer.

13. FAQs

Q: Can I use CNNs for things other than images? A: Yes! 1D Convolutions (Conv1D) are highly effective at processing audio wave signals and even certain types of sequential text and time-series data.

14. Summary

Convolutional Neural Networks revolutionized Artificial Intelligence. By utilizing sliding filters to extract edges and shapes, and MaxPooling to compress the data, CNNs can "see" complex images without buckling under billions of parameters. They are the undeniable kings of Computer Vision.

15. Next Chapter Recommendation

Training a CNN from scratch on 100,000 images takes weeks on a supercomputer. What if you only have a laptop and 500 images of your specific dog? Can you still build a world-class AI? Yes! In Chapter 12: Transfer Learning in TensorFlow, we will learn how to "steal" the brains of supercomputers.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·