CHAPTER 12
Intermediate
File Storage and Content Delivery
Updated: May 16, 2026
25 min read
# CHAPTER 12
File Storage and Content Delivery
1. Introduction
Relational databases are brilliant at storing text strings, numbers, and boolean values. They are catastrophic at storing massive binary files. If you attempt to store a 4GB 4K video file as a BLOB (Binary Large Object) inside a PostgreSQL database, the database will quickly run out of space, slow to a crawl, and eventually crash under the immense read/write pressure. Modern applications (like Instagram, Netflix, and TikTok) require architectures built entirely around the ingestion, storage, and global delivery of massive media files. In this chapter, we will master File Storage and Content Delivery. We will explore the infinite scaling capabilities of Object Storage (AWS S3), architect global Content Delivery Networks (CDNs) to reduce physical latency, and engineer heavy media optimization pipelines.2. Learning Objectives
By the end of this chapter, you will be able to:- Explain why storing large media files inside an SQL database is an architectural anti-pattern.
- Compare traditional Block Storage against modern Object Storage (AWS S3).
- Architect a highly scalable, secure File Upload pipeline using Pre-Signed URLs.
- Understand how Content Delivery Networks (CDNs) cache assets globally at the Edge.
- Design an asynchronous worker pipeline for heavy image/video transcoding.
3. Block Storage vs. Object Storage
To store files, you must choose the right physical architecture.- Block Storage (The Hard Drive): This is the traditional hard drive attached directly to your server (e.g., AWS EBS). It is incredibly fast, but it is expensive and physically limited. You cannot easily share a block drive across 1,000 servers.
- Object Storage (The Data Lake): This is the modern cloud standard (e.g., AWS S3, Google Cloud Storage). Files are stored as "Objects" in a flat, infinitely scalable bucket. Each object has a unique URL.
- *The Magic:* Object storage scales infinitely. You never have to worry about "running out of disk space." It is also extremely cheap compared to Block Storage.
4. The Database Pointer Architecture
How do you link a user to their profile picture?-
The Anti-Pattern: Storing the physical
.jpgbinary code inside the SQL database.
-
The Standard Architecture: The physical
profile.jpgfile is uploaded directly to an Object Storage bucket (AWS S3). The bucket generates a secure URL:https://s3.aws.com/mybucket/profile.jpg. You then save *only that string URL* into your PostgreSQL database.
- *Result:* Your database remains incredibly small, fast, and lightweight.
5. Content Delivery Networks (CDNs)
Storing a file in an AWS bucket in Virginia is great. But if a user in Australia tries to download that 5GB video, it has to travel 10,000 miles across undersea cables, causing massive latency.- The CDN (Edge Locations): A CDN (Cloudflare, AWS CloudFront) is a massive network of thousands of tiny proxy servers located in major cities worldwide (Tokyo, London, Sydney).
- The Workflow: When the Australian user requests the video, the request hits the Sydney CDN server first. If the CDN doesn't have it (Cache Miss), it fetches it once from Virginia, stores a copy in Sydney, and gives it to the user. The next 1 million Australian users will download it directly from Sydney, experiencing zero latency and completely protecting your Virginia bucket from the massive traffic load.
6. File Upload Pipelines (Pre-Signed URLs)
Uploading large files through your primary web servers is dangerous.- The Danger: If a user uploads a 2GB video through your Node.js API server, that connection must stay open for 10 minutes. If 1,000 users do this, your API servers will hit their maximum connection limits and crash, locking out all other traffic.
- The Pre-Signed URL Solution:
- 1. Client asks API server: "I want to upload a video."
- 2. API server secretly talks to AWS S3 and generates a temporary, secure "Pre-Signed URL" valid for exactly 15 minutes.
- 3. API server sends URL to Client.
- 4. Client uploads the 2GB video *directly* to AWS S3 using the URL, completely bypassing your API servers.
7. Diagrams/Visual Suggestions
*Architecture Diagram: Pre-Signed URL Upload Pipeline*
text
8. Best Practices
- Media Optimization Workers: Never serve an 8MB 4K image directly to a mobile user on a 3G network. Architect an asynchronous event queue (as learned in Chapter 10). When an image lands in S3, S3 fires an event to a worker server. The worker compresses the image, creates 3 sizes (Thumbnail, Mobile, Desktop), saves them back to S3, and updates the database.
9. Common Mistakes
- Public Buckets (The Data Leak): The most common security failure in the cloud is an engineer accidentally setting an AWS S3 bucket permission to "Public Read/Write." *The Failure:* Anyone on the internet can download your entire company's database backups, or worse, upload malicious malware into your application. *The Fix:* Buckets MUST be private by default, accessed only via strict IAM policies or CDN distributions.
10. Mini Project: Architect a Netflix-Clone Storage System
Let's design global video delivery.- 1. The Ingestion: Studios upload raw 50GB video masters directly into AWS S3 (US-East) via secure Pre-Signed URLs.
- 2. The Transcoding: S3 triggers an event to a Kafka queue. Hundreds of Worker servers pull the video, slice it into 10-second chunks, compress it into multiple resolutions (480p, 1080p, 4K), and save the thousands of new files back to S3.
- 3. The Global Distribution: We connect AWS CloudFront (CDN) to the S3 bucket.
- 4. The User Experience: A user in Berlin clicks "Play." Their TV connects to the Berlin CDN edge server, instantly downloading the German-cached, 1080p compressed video chunks with 10ms latency.
11. Practice Exercises
- 1. Compare "Block Storage" with "Object Storage." Why do massive modern applications like Instagram rely exclusively on Object Storage for saving user media?
- 2. Define the architectural danger of routing massive file uploads (e.g., 5GB videos) through your primary Node.js or Python API servers. How do Pre-Signed URLs solve this bottleneck?
12. MCQs with Answers
Question 1
A system architect is designing the database schema for a new social media application. They decide to store the binary code of the users' uploaded images directly inside the rows of a PostgreSQL database to keep all data in one place. Why is this considered an architectural anti-pattern?
Question 2
When a global user requests a massive 4K video file, the request is intercepted by a proxy server located physically close to the user's geographic location (e.g., a server in Tokyo serving a Japanese user) rather than hitting the primary origin server in New York. What is this global caching architecture called?
13. Interview Questions
- Q: Explain the mechanical flow of a "Pre-Signed URL" architecture for file uploads. Why is it structurally superior to having the client upload the file directly to your web server?
- Q: A client in London complains that your website's heavy header images take 10 seconds to load, while users in New York report the site is instantaneous. Walk me through exactly how you would configure a CDN to solve this geographic latency.
- Q: Walk me through the asynchronous architecture required to handle user video uploads. Once the raw video hits your S3 bucket, how do you queue, transcode, and compress the video without blocking the user's interface?