Skip to main content
TensorFlow Introduction
CHAPTER 07 Intermediate

Building Your First Neural Network

Updated: May 16, 2026
6 min read

# CHAPTER 7

Building Your First Neural Network

1. Introduction

We have covered the theory and the basic data structures. Now, it is time for the main event. We are going to build a functional Artificial Neural Network. Using TensorFlow's high-level Keras API, building a network is as simple as snapping Lego bricks together. In this chapter, we will build a model capable of looking at an image of a handwritten number (from 0 to 9) and predicting what number it is. This is the "Hello World" of Computer Vision.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Use the Sequential model API in Keras.
  • Understand and implement Dense layers.
  • Define Input and Output layer shapes.
  • Build a multi-layer Neural Network architecture.
  • Compile a model for training.

3. The Sequential Model

In Keras, the simplest way to build a neural network is using a Sequential model. This tells TensorFlow: "I am going to give you a list of layers. The data will flow sequentially from the first layer, to the second, and so on."
python
12345
import tensorflow as tf
from tensorflow.keras.models import Sequential

# Initialize an empty Sequential model
model = Sequential()

4. Adding Dense Layers

A Dense layer (also called a Fully Connected layer) is the standard neural network layer we learned about in Chapter 5. In a Dense layer, every single neuron is connected to every single neuron in the previous layer.
python
12345678910111213141516
from tensorflow.keras.layers import Dense, Flatten

# Building the architecture
model = Sequential([
    # Input Layer (Flatten converts a 2D image into a 1D vector of pixels)
    Flatten(input_shape=(28, 28)),
    
    # Hidden Layer 1 (128 neurons, ReLU activation)
    Dense(128, activation='relu'),
    
    # Hidden Layer 2 (64 neurons, ReLU activation)
    Dense(64, activation='relu'),
    
    # Output Layer (10 neurons because we have 10 digits to predict: 0-9)
    Dense(10, activation='softmax')
])

5. Compiling the Model

Before we can train the model, we must "compile" it. Compiling configures the learning process by defining three things:
  1. 1. Optimizer: The algorithm that updates the Weights during Backpropagation (e.g., adam or sgd).
  1. 2. Loss Function: The mathematical formula that calculates how wrong the model's guesses are (e.g., sparsecategoricalcrossentropy).
  1. 3. Metrics: What we want to monitor during training (e.g., accuracy).

*(We will dive deeply into Activations, Optimizers, and Loss functions in Chapter 8).*

python
12345
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

6. Mini Project: Handwritten Digit Classifier

Let's put it all together using the famous MNIST dataset (70,000 images of handwritten digits). TensorFlow provides this dataset built-in!
python
12345678910111213141516171819202122232425262728293031
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten

# 1. Load Data
mnist = tf.keras.datasets.mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# 2. Preprocess Data (Scale pixels to be between 0 and 1)
X_train = X_train / 255.0
X_test = X_test / 255.0

# 3. Build Architecture
model = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

# 4. Compile
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# 5. Train! (epochs=5 means the model sees the entire dataset 5 times)
print("Training started...")
model.fit(X_train, y_train, epochs=5)

# 6. Evaluate on unseen Test data
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=2)
print(f"Test Accuracy: {test_acc*100:.2f}%")

7. Analyzing Model Architecture

You can view a summary of your newly created "brain" using a simple command:
python
1
model.summary()

This prints a beautiful table showing every layer, the shape of the data passing through it, and exactly how many "Parameters" (Weights and Biases) the model needs to learn. For this simple model, it is over 100,000 parameters!

8. Common Mistakes

  • Forgetting the Input Shape: The very first layer of your model MUST define the inputshape. If you don't tell the model what size the incoming data is (e.g., a 28x28 image), it doesn't know how many weights to create, and the code will crash.
  • Wrong Output Layer Size: If you are predicting 10 different categories (digits 0-9), your final output layer must have exactly Dense(10). If you are predicting Cat vs. Dog, it must be Dense(2) (or Dense(1) with a Sigmoid activation).

9. Best Practices

  • Scaling Data: Neural networks struggle with large numbers. Notice in the code above we divided the raw image pixels (0-255) by 255.0. Squashing inputs between 0 and 1 allows the network to learn significantly faster and achieve higher accuracy.

10. Exercises

  1. 1. Modify the Mini Project code to add a third hidden layer with 32 neurons. Re-run the code. Does the accuracy improve?
  1. 2. Call model.summary() on your new model and look at the "Total params" count.

11. MCQ Quiz with Answers

Question 1

What does the Flatten layer do in a Keras model?

Question 2

When compiling a model, what does the "Optimizer" do?

12. Interview Questions

  • Q: Explain the purpose of the Sequential class in Keras.
  • Q: Why do we divide image pixel arrays by 255.0 before feeding them into a neural network?

13. FAQs

Q: Can I build models that aren't sequential (e.g., data splits into two paths)? A: Yes! While Sequential is great for 90% of beginner tasks, Keras offers the Functional API for building highly complex, branching architectures. We will cover this in Chapter 10.

14. Summary

You just built your first Artificial Intelligence! By using the Keras Sequential API, we neatly stacked a Flatten layer to handle the image format, Dense layers to extract mathematical patterns, and an Output layer to make the final prediction. We compiled the model with an optimizer and loss function, allowing it to successfully learn how to read handwriting.

15. Next Chapter Recommendation

In our code, we mysteriously used keywords like activation='relu' and loss='sparse
categorical_crossentropy'. What do these actually mean? If you pick the wrong ones, your model will fail completely. In Chapter 8: Activation Functions and Loss Functions, we will demystify the math that makes learning possible.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·