Skip to main content
PyTorch Essentials
CHAPTER 14 Intermediate

Recurrent Neural Networks (RNN)

Updated: May 16, 2026
6 min read

# CHAPTER 14

Recurrent Neural Networks (RNN)

1. Introduction

A standard Linear network has amnesia. When it looks at Word #2 in a sentence, it has completely forgotten Word #1. This is fine for classifying a single static image, but it is disastrous for processing language, where context and word order dictate the meaning of the sentence. To solve this, researchers created the Recurrent Neural Network (RNN). An RNN has a built-in "memory" loop, allowing it to remember past inputs to understand the present. In this chapter, we will learn how RNNs model time and sequence in PyTorch.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Define Sequential Data.
  • Explain the architecture of a Recurrent Neural Network (RNN).
  • Understand how the Hidden State acts as memory.
  • Implement an nn.RNN layer in PyTorch.
  • Identify the limitations of basic RNNs (Vanishing Gradients).

3. What is Sequential Data?

Sequential data is any data where the *order* of the data points matters.
  • Text: "I am happy" makes sense. "Happy am I" changes the context.
  • Time-Series: Stock market prices over a week. The price on Tuesday is heavily dependent on the price on Monday.
  • Audio: A spoken sentence is just a sequence of soundwaves over time.
Standard CNNs and Linear networks cannot handle sequential data effectively.

4. How an RNN Works

Imagine reading a book. As you read the current word, your brain holds the context of the previous words to understand the sentence. An RNN does the exact same thing.
  1. 1. At Time Step 1, the RNN reads Word 1 ("The"). It processes it and generates a Hidden State (a mathematical memory summary).
  1. 2. At Time Step 2, the RNN reads Word 2 ("Dog"). *Crucially, it also reads the Hidden State from Time Step 1!* It combines the new word with the old memory to generate a new, updated Hidden State.
  1. 3. This loop continues until the end of the sentence. By the final word, the Hidden State contains a mathematical summary of the entire sentence's context.

5. Implementing an RNN in PyTorch

Let's build an RNN for Sentiment Analysis (predicting if a movie review is Positive or Negative).
python
123456789101112131415161718192021222324252627282930313233343536373839
import torch
import torch.nn as nn

class SimpleRNNModel(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_size):
        super().__init__()
        
        # 1. Word Embeddings (Covered in Chapter 13)
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        
        # 2. The Recurrent Layer!
        # batch_first=True means we expect our tensors to be [Batch, Sequence, Features]
        self.rnn = nn.RNN(input_size=embed_dim, hidden_size=hidden_size, batch_first=True)
        
        # 3. Dense Output Layer
        # It takes the FINAL hidden state memory and makes a Yes/No prediction
        self.fc = nn.Linear(hidden_size, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        # x shape: [Batch, Sequence Length]
        embedded = self.embedding(x)
        
        # Pass through RNN. 
        # rnn_out contains the output for EVERY time step.
        # hidden contains the FINAL memory state after reading the whole sentence.
        rnn_out, hidden = self.rnn(embedded)
        
        # We only care about the final memory state for text classification
        # We squeeze out unnecessary dimensions
        final_memory = hidden.squeeze(0) 
        
        # Make the prediction
        prediction = self.sigmoid(self.fc(final_memory))
        return prediction

# Instantiate the model
model = SimpleRNNModel(vocab_size=1000, embed_dim=32, hidden_size=64)
print(model)

6. The Outputs of nn.RNN

PyTorch's nn.RNN returns a tuple of two items: (output, hidden). Understanding this is critical:
  • output: Contains the memory states generated after reading *every single word* in the sequence. You use this if you are doing word-by-word translation.
  • hidden: Contains *only* the very last memory state generated at the end of the sequence. You use this for document classification (like Sentiment Analysis) because it summarizes the entire sentence.

7. The Vanishing Gradient Problem

nn.RNN is brilliant in theory, but terrible in practice for long sequences. If a paragraph is 100 words long, by the time the RNN reaches Word 100, the mathematical memory of Word 1 has been multiplied and diluted so many times during backpropagation that it completely vanishes (The Vanishing Gradient Problem). *Result:* A standard RNN can only remember short sentences (about 10-15 words). For long documents, it suffers from severe amnesia.

8. Common Mistakes

  • Forgetting batchfirst=True: By default, PyTorch's RNN expects data to be shaped [Sequence Length, Batch, Features]. This is completely backward from standard Convolutional or Linear layers which expect [Batch, ...]. Always set batchfirst=True to keep your code sane and uniform.
  • Using RNNs for non-sequential data: If you use an RNN to process tabular data (like predicting house prices based on Square Footage and Beds), it will fail. Tabular features have no logical time-step order.

9. Best Practices

  • Use RNNs for Baselines: While standard nn.RNN is rarely used in modern production (due to the Vanishing Gradient problem), building one is an excellent way to baseline your NLP pipeline before moving to more complex models.

10. Exercises

  1. 1. What is the "Hidden State" in an RNN, and what biological function does it mimic?
  1. 2. If PyTorch's nn.RNN returns (output, hidden), which variable should you pass to the final nn.Linear layer if your goal is to classify the overall sentiment of the entire sentence?

11. MCQ Quiz with Answers

Question 1

What specific capability makes an RNN superior to a Linear network for processing text?

Question 2

Why is a standard nn.RNN ineffective at processing very long documents?

12. Interview Questions

  • Q: Explain the flow of data and the updating of the Hidden State through an nn.RNN layer during one time step.
  • Q: Explain the difference between the two outputs (output and hidden) provided by PyTorch's nn.RNN.

13. FAQs

Q: Can RNNs predict the future? A: Yes! By feeding an RNN historical stock prices (e.g., Days 1-10), you can train it to predict the sequence of Day 11. This is called Time-Series Forecasting.

14. Summary

Recurrent Neural Networks introduced the concept of memory into Deep Learning. By utilizing a hidden state that loops and updates with every new time step, RNNs can process sequential data like language and time-series. However, their struggle to retain long-term memory requires a more advanced solution.

15. Next Chapter Recommendation

How do we fix the Vanishing Gradient problem? How do we build a model that can remember the first word of a 500-word essay? In Chapter 15: LSTM and Sequence Models, we will introduce the heavy machinery of NLP: The Long Short-Term Memory network.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·