CHAPTER 14
Intermediate
Recurrent Neural Networks (RNN)
Updated: May 16, 2026
6 min read
# CHAPTER 14
Recurrent Neural Networks (RNN)
1. Introduction
A standard Linear network has amnesia. When it looks at Word #2 in a sentence, it has completely forgotten Word #1. This is fine for classifying a single static image, but it is disastrous for processing language, where context and word order dictate the meaning of the sentence. To solve this, researchers created the Recurrent Neural Network (RNN). An RNN has a built-in "memory" loop, allowing it to remember past inputs to understand the present. In this chapter, we will learn how RNNs model time and sequence in PyTorch.2. Learning Objectives
By the end of this chapter, you will be able to:- Define Sequential Data.
- Explain the architecture of a Recurrent Neural Network (RNN).
- Understand how the Hidden State acts as memory.
-
Implement an
nn.RNNlayer in PyTorch.
- Identify the limitations of basic RNNs (Vanishing Gradients).
3. What is Sequential Data?
Sequential data is any data where the *order* of the data points matters.- Text: "I am happy" makes sense. "Happy am I" changes the context.
- Time-Series: Stock market prices over a week. The price on Tuesday is heavily dependent on the price on Monday.
- Audio: A spoken sentence is just a sequence of soundwaves over time.
4. How an RNN Works
Imagine reading a book. As you read the current word, your brain holds the context of the previous words to understand the sentence. An RNN does the exact same thing.- 1. At Time Step 1, the RNN reads Word 1 ("The"). It processes it and generates a Hidden State (a mathematical memory summary).
- 2. At Time Step 2, the RNN reads Word 2 ("Dog"). *Crucially, it also reads the Hidden State from Time Step 1!* It combines the new word with the old memory to generate a new, updated Hidden State.
- 3. This loop continues until the end of the sentence. By the final word, the Hidden State contains a mathematical summary of the entire sentence's context.
5. Implementing an RNN in PyTorch
Let's build an RNN for Sentiment Analysis (predicting if a movie review is Positive or Negative).
python
6. The Outputs of nn.RNN
PyTorch's nn.RNN returns a tuple of two items: (output, hidden). Understanding this is critical:
-
output: Contains the memory states generated after reading *every single word* in the sequence. You use this if you are doing word-by-word translation.
-
hidden: Contains *only* the very last memory state generated at the end of the sequence. You use this for document classification (like Sentiment Analysis) because it summarizes the entire sentence.
7. The Vanishing Gradient Problem
nn.RNN is brilliant in theory, but terrible in practice for long sequences.
If a paragraph is 100 words long, by the time the RNN reaches Word 100, the mathematical memory of Word 1 has been multiplied and diluted so many times during backpropagation that it completely vanishes (The Vanishing Gradient Problem).
*Result:* A standard RNN can only remember short sentences (about 10-15 words). For long documents, it suffers from severe amnesia.
8. Common Mistakes
-
Forgetting
batchfirst=True: By default, PyTorch's RNN expects data to be shaped[Sequence Length, Batch, Features]. This is completely backward from standard Convolutional or Linear layers which expect[Batch, ...]. Always setbatchfirst=Trueto keep your code sane and uniform.
- Using RNNs for non-sequential data: If you use an RNN to process tabular data (like predicting house prices based on Square Footage and Beds), it will fail. Tabular features have no logical time-step order.
9. Best Practices
-
Use RNNs for Baselines: While standard
nn.RNNis rarely used in modern production (due to the Vanishing Gradient problem), building one is an excellent way to baseline your NLP pipeline before moving to more complex models.
10. Exercises
- 1. What is the "Hidden State" in an RNN, and what biological function does it mimic?
-
2.
If PyTorch's
nn.RNNreturns(output, hidden), which variable should you pass to the finalnn.Linearlayer if your goal is to classify the overall sentiment of the entire sentence?
11. MCQ Quiz with Answers
Question 1
What specific capability makes an RNN superior to a Linear network for processing text?
Question 2
Why is a standard nn.RNN ineffective at processing very long documents?
12. Interview Questions
-
Q: Explain the flow of data and the updating of the Hidden State through an
nn.RNNlayer during one time step.
-
Q: Explain the difference between the two outputs (
outputandhidden) provided by PyTorch'snn.RNN.