Building Your First Neural Network
# CHAPTER 7
Building Your First Neural Network
1. Introduction
We have covered the theory and the basic data structures. Now, it is time for the main event. We are going to build a functional Artificial Neural Network. Using TensorFlow's high-level Keras API, building a network is as simple as snapping Lego bricks together. In this chapter, we will build a model capable of looking at an image of a handwritten number (from 0 to 9) and predicting what number it is. This is the "Hello World" of Computer Vision.2. Learning Objectives
By the end of this chapter, you will be able to:-
Use the
Sequentialmodel API in Keras.
-
Understand and implement
Denselayers.
- Define Input and Output layer shapes.
- Build a multi-layer Neural Network architecture.
- Compile a model for training.
3. The Sequential Model
In Keras, the simplest way to build a neural network is using a Sequential model. This tells TensorFlow: "I am going to give you a list of layers. The data will flow sequentially from the first layer, to the second, and so on."4. Adding Dense Layers
A Dense layer (also called a Fully Connected layer) is the standard neural network layer we learned about in Chapter 5. In a Dense layer, every single neuron is connected to every single neuron in the previous layer.5. Compiling the Model
Before we can train the model, we must "compile" it. Compiling configures the learning process by defining three things:-
1.
Optimizer: The algorithm that updates the Weights during Backpropagation (e.g.,
adamorsgd).
-
2.
Loss Function: The mathematical formula that calculates how wrong the model's guesses are (e.g.,
sparsecategoricalcrossentropy).
-
3.
Metrics: What we want to monitor during training (e.g.,
accuracy).
*(We will dive deeply into Activations, Optimizers, and Loss functions in Chapter 8).*
6. Mini Project: Handwritten Digit Classifier
Let's put it all together using the famous MNIST dataset (70,000 images of handwritten digits). TensorFlow provides this dataset built-in!7. Analyzing Model Architecture
You can view a summary of your newly created "brain" using a simple command:This prints a beautiful table showing every layer, the shape of the data passing through it, and exactly how many "Parameters" (Weights and Biases) the model needs to learn. For this simple model, it is over 100,000 parameters!
8. Common Mistakes
-
Forgetting the Input Shape: The very first layer of your model MUST define the
inputshape. If you don't tell the model what size the incoming data is (e.g., a 28x28 image), it doesn't know how many weights to create, and the code will crash.
-
Wrong Output Layer Size: If you are predicting 10 different categories (digits 0-9), your final output layer must have exactly
Dense(10). If you are predicting Cat vs. Dog, it must beDense(2)(orDense(1)with a Sigmoid activation).
9. Best Practices
- Scaling Data: Neural networks struggle with large numbers. Notice in the code above we divided the raw image pixels (0-255) by 255.0. Squashing inputs between 0 and 1 allows the network to learn significantly faster and achieve higher accuracy.
10. Exercises
- 1. Modify the Mini Project code to add a third hidden layer with 32 neurons. Re-run the code. Does the accuracy improve?
-
2.
Call
model.summary()on your new model and look at the "Total params" count.
11. MCQ Quiz with Answers
What does the Flatten layer do in a Keras model?
When compiling a model, what does the "Optimizer" do?
12. Interview Questions
-
Q: Explain the purpose of the
Sequentialclass in Keras.
- Q: Why do we divide image pixel arrays by 255.0 before feeding them into a neural network?
13. FAQs
Q: Can I build models that aren't sequential (e.g., data splits into two paths)? A: Yes! WhileSequential is great for 90% of beginner tasks, Keras offers the Functional API for building highly complex, branching architectures. We will cover this in Chapter 10.
14. Summary
You just built your first Artificial Intelligence! By using the KerasSequential API, we neatly stacked a Flatten layer to handle the image format, Dense layers to extract mathematical patterns, and an Output layer to make the final prediction. We compiled the model with an optimizer and loss function, allowing it to successfully learn how to read handwriting.
15. Next Chapter Recommendation
In our code, we mysteriously used keywords likeactivation='relu' and loss='sparsecategorical_crossentropy'. What do these actually mean? If you pick the wrong ones, your model will fail completely. In Chapter 8: Activation Functions and Loss Functions, we will demystify the math that makes learning possible.