Building Caching Systems with Redis | Strategies and Invalidation
# CHAPTER 17
Building Caching Systems with Redis
1. Introduction
Connecting Node.js or PHP to Redis is the easy part. The true challenge is Architecture. When exactly do you read from the cache? When do you update the cache? What happens when the underlying MySQL data changes, but the Redis cache still holds the old data? In this chapter, we will transition from writing code to designing enterprise-grade Caching Systems. We will explore the standard architectural patterns used by companies like Netflix and Twitter to serve data to billions of users.2. Learning Objectives
By the end of this chapter, you will be able to:- Implement the "Cache-Aside" (Lazy Loading) pattern.
- Implement the "Write-Through" cache pattern.
- Understand Page Caching vs. API Caching vs. Query Caching.
- Master Cache Invalidation (The hardest problem in computer science).
- Architect a high-speed blog caching system.
3. The Cache-Aside Pattern (Lazy Loading)
This is the most common caching pattern in the world. Redis sits *aside* the application.- 1. The application receives a request for User #55.
- 2. The application asks Redis: "Do you have User #55?"
- 3. If Hit: Application returns the data to the user instantly.
- 4. If Miss: Application queries MySQL. MySQL returns the data. The application saves the data into Redis with an Expiration timer (TTL), and *then* returns it to the user.
Pros: If a user requests a piece of data that nobody ever asks for, it never gets put into Redis, saving valuable RAM. Cons: The very first person to ask for the data suffers a slow load time (the "Cold Start").
4. The Write-Through Pattern
In this pattern, the application treats Redis as the primary database.- 1. A user updates their profile.
- 2. The application writes the new data into MySQL AND simultaneously writes the exact same data into Redis.
Pros: There is no "Cold Start." The cache is always 100% perfectly synchronized with the database. Cons: If you have 1 million users, and they update their profiles but never log in again, you just wasted 1 million slots in your RAM for data that is never being read!
5. Types of Web Caching
You can cache data at different levels of your application stack:-
Query Caching: Caching the direct array result of a
SELECT * FROM userscommand.
-
API Caching: Caching the massive JSON string output of a REST API endpoint (e.g.,
/api/v1/dashboard).
- Page Caching (Full HTML): The most extreme method. Your backend PHP/Node.js generates a massive 500-line HTML webpage, and you literally store the entire HTML string in Redis! When the next user connects, the server doesn't even boot up the HTML templating engine; it just blasts the raw HTML string directly to the browser.
6. Cache Invalidation (The Hardest Problem)
Phil Karlton famously said: *"There are only two hard things in Computer Science: cache invalidation and naming things."*If an E-Commerce site caches a Laptop's price as $1000 for 24 hours, and the manager changes the price in MySQL to $1200, users will still see $1000 for the rest of the day. The company loses money. How to Invalidate (Delete) the Cache:
-
1.
Time-Based (TTL): Just let the
EXPIREtimer run out. (Only good for data that doesn't matter if it's stale, like "Recent Tweets").
-
2.
Event-Based (Active Invalidation): The backend code that executes the
UPDATE productsquery must immediately execute$redis->del("product101"). The next time a user visits, it forces a Cache Miss, and fetches the fresh $1200 price!
7. Mini Project: High-Speed Blog Caching System
Scenario: A Wordpress-style blog is crashing under heavy traffic. We will implement Full HTML Page Caching with Event-Based Invalidation.-
1.
The Viewer: When a user visits
blog.com/post/my-story, the backend checks Redis for the keyhtmlpage:my-story. If it exists, it serves the HTML instantly. If not, it builds the page, saves it to Redis indefinitely (SET htmlpage:my-story "<html>..."), and serves it.
- 2. The Author: The blog author logs into the admin panel, edits a typo in the article, and clicks "Save".
-
3.
The Invalidation: The backend
savepost.phpscript updates MySQL, and then explicitly runs$redis->del("html_page:my-story").
- 4. Result: The cache is cleared precisely when the data changes. The next visitor experiences a Cache Miss, triggering the server to generate and cache the newly corrected HTML!
8. Common Mistakes
- The Thundering Herd Problem: Imagine your website homepage cache is set to expire every 60 seconds. At second 61, the cache vanishes. At that exact millisecond, 5,000 users refresh the page. All 5,000 users experience a Cache Miss simultaneously, and all 5,000 queries hit your MySQL database at the exact same time, instantly crashing the server.
9. Best Practices
-
Versioned Keys: Instead of trying to aggressively
DELa cache key, just change the name of the key! E.g.,user:55:v1. When the user updates their profile, your application starts looking foruser:55:v2. The oldv1key will eventually just expire naturally, and you never had to worry about explicit invalidation.
10. Exercises
- 1. What caching pattern involves checking Redis first, and if the data is missing, querying the primary database and saving the result back into Redis?
- 2. What is the technical term for the architectural problem of deleting stale data from Redis when the underlying MySQL database is updated?
11. Redis Challenges
You are the architect for a weather application. The temperature in New York changes constantly. You cache the API response:SETEX weather:NYC 3600 "{temp: 72}". Users complain that when it rains, the app still shows sunny weather for an hour. Why is this happening, and how would you fix the architecture?
*(Answer: The application is suffering from stale data because the Time-To-Live (TTL) is set to 3600 seconds (1 hour). For highly dynamic data like live weather, the TTL must be drastically reduced (e.g., 60 seconds), or the architecture must be shifted to a Write-Through pattern where the weather forecasting system actively updates the Redis key every time a meteorological change is detected).*
12. MCQ Quiz with Answers
When implementing a strict "Cache-Aside" architectural pattern, which component is explicitly responsible for writing new data into the Redis memory engine?
An E-Commerce platform caches the HTML of a product page. The marketing team updates the product description in the admin dashboard, but the live website continues to display the old text. What is the most robust architectural solution to ensure this "Stale Cache" problem never occurs again?
14. Interview Questions
- Q: Explain the "Thundering Herd" (Cache Stampede) problem in high-concurrency architectures. Describe two strategies (e.g., TTL Jitter, Mutex Locks) to prevent an expiring Redis key from crashing the primary relational database.
- Q: Compare and contrast the "Cache-Aside" and "Write-Through" caching patterns. In what specific business scenario would you explicitly choose Write-Through despite its higher memory overhead?