CHAPTER 15
Intermediate
Deploying Databases in Kubernetes
Updated: May 15, 2026
30 min read
# CHAPTER 15
Deploying Databases in Kubernetes
1. Introduction
Historically, software engineers vehemently argued that databases should *never* be run inside Kubernetes. The ephemeral nature of Pods terrified database administrators. However, as Kubernetes matured and introduced robust controllers specifically designed for stateful workloads, the industry shifted. In this chapter, we will overcome the fear of ephemeral storage by deploying resilient, persistent database clusters utilizing the powerful StatefulSet controller.2. Learning Objectives
By the end of this chapter, you will be able to:- Understand the unique challenges of running Databases in Kubernetes.
- Contrast a standard Deployment with a StatefulSet.
- Understand stable network identities (Headless Services).
- Deploy a MySQL database using a StatefulSet and PVC.
- Discuss the merits of Managed Cloud Databases (RDS) vs. Kubernetes databases.
3. Beginner-Friendly Explanation
Imagine organizing a fleet of delivery trucks.- Deployment (Stateless): You manage a fleet of generic white vans. If Van #4 breaks down, you buy a new white van. You do not care about the license plate or the name of the van, as long as you have 10 vans driving. (This is perfect for web servers).
- StatefulSet (Stateful): You manage a fleet of armored bank trucks. Truck #1 strictly carries gold. Truck #2 strictly carries diamonds. If Truck #1 breaks down, you cannot just replace it with a generic van. You need a highly specific replacement truck that instantly inherits the identity, the security clearance, and the specific cargo (The PVC) of the original Truck #1.
A StatefulSet guarantees strict identity and strict storage mapping.
4. The StatefulSet Controller
A StatefulSet is the sibling of the Deployment controller, designed exclusively for databases (MySQL, MongoDB, Cassandra). Unlike a Deployment, a StatefulSet maintains a sticky identity for each of its Pods.-
Ordered Naming: If you deploy a StatefulSet named
mysqlwith 3 replicas, it doesn't create random names likemysql-8f7b5. It creates exactly:mysql-0,mysql-1, andmysql-2.
-
Ordered Creation/Deletion: It starts them in order. It waits for
mysql-0to be fully healthy before startingmysql-1.
-
Sticky Storage: This is the most critical feature. It dynamically provisions a unique Persistent Volume Claim (PVC) for *each* specific Pod. If
mysql-1crashes, Kubernetes creates a newmysql-1and rigorously attaches the *exact same hard drive* back to it.
5. Headless Services
When deploying a StatefulSet, you do not use a standard ClusterIP service to load balance traffic randomly. (You rarely want to send a WRITE request randomly to a Read-Only database replica!). You create a Headless Service (by settingclusterIP: None in the Service YAML). This allows you to communicate directly with specific Pods using DNS (e.g., mysql-0.mysql-service.default.svc.cluster.local).
6. Anatomy of a StatefulSet YAML
Notice thevolumeClaimTemplates block. This tells Kubernetes: "Every time you create a Pod in this set, automatically generate a new 10Gi PVC specifically for it."
yaml
7. Mini Project: Deploy a Persistent MySQL DB
Let's launch an armored bank truck.Step-by-Step Tutorial:
-
1.
Save the YAML from Section 6 into
stateful-mysql.yaml.
-
2.
First, we must create the Headless Service required by the StatefulSet. Create
mysql-svc.yaml:
yaml
- 3. Apply both files:
bash
-
4.
Watch the Pod creation:
kubectl get pods -w. You will notice it boots with the strict namemysql-0.
-
5.
Verify the dynamic storage creation:
kubectl get pvc. You will see Kubernetes automatically generated a PVC nameddata-mysql-0.
-
6.
The Test: Delete the Pod!
kubectl delete pod mysql-0.
-
7.
Watch
kubectl get pods. Kubernetes will instantly recreate it, name itmysql-0again, and securely remount thedata-mysql-0hard drive to it. Total data persistence achieved!
8. Real-World Scenarios
The Great Debate: Should you run databases in Kubernetes in production? If you are a startup, running PostgreSQL inside your AWS EKS cluster saves money because you don't have to pay for a separate Amazon RDS instance. However, managing database backups, disaster recovery, and master-slave replication inside Kubernetes is extremely difficult. Most enterprise companies prefer to use Managed Services (like AWS RDS or Google Cloud SQL) for their databases, reserving Kubernetes purely for their stateless Node.js/PHP web applications.9. Best Practices
- The Operator Pattern: Manually managing a 3-node PostgreSQL cluster (handling leader election, failovers, and backups) using raw StatefulSets is agonizing. In production, you must use the Operator Pattern. You install a third-party software (like the Zalando Postgres Operator) into your cluster. The Operator acts as a robot Database Administrator, completely automating the complex lifecycle of the database cluster.
10. Common Mistakes
-
Scaling Down a StatefulSet: If you scale a StatefulSet down from 3 replicas to 2, Kubernetes deletes the
mysql-2Pod. Crucially, it DOES NOT delete the PVC (data-mysql-2)! This is a safety mechanism to prevent accidental data loss. You must manually runkubectl delete pvc data-mysql-2to stop paying your cloud provider for that physical hard drive.
11. Exercises
- 1. Contrast the naming conventions and creation ordering of Pods managed by a Deployment versus Pods managed by a StatefulSet.
- 2. Explain the function of a Headless Service. Why is random load balancing detrimental to a Master-Replica database architecture?
12. FAQs
Q: Can I use a Deployment for a database if I only want 1 replica? A: Technically, yes. If you only ever havereplicas: 1, a Deployment with a PVC will work. However, using a StatefulSet is the strict industry standard, as it guarantees ordering, sticky identity, and future-proofs the architecture in case you ever need to scale to a multi-node cluster.
13. Interview Questions
-
Q: Detail the architectural necessity of the
volumeClaimTemplatesspecification within a StatefulSet manifest. How does this differ from manually defining a static PVC in a Deployment?
- Q: A CTO asks for your architectural recommendation: Deploy the primary transactional PostgreSQL database inside the company's existing Kubernetes cluster, or utilize a managed cloud service like AWS RDS. Present a balanced argument highlighting the operational overhead of both approaches.
14. Summary
In Chapter 15, we bridged the gap between ephemeral compute and permanent data. We recognized that standard Deployments destroy the strict identity required by clustered databases. We introduced the StatefulSet controller, mastering its ability to provide ordered execution, sticky network identities via Headless Services, and guaranteed data persistence through dynamicvolumeClaimTemplates. While running databases in Kubernetes presents operational overhead, we have proven it is architecturally sound and highly resilient.