Skip to main content
Scikit-learn Basics
CHAPTER 14 Intermediate

Clustering with K-Means

Updated: May 16, 2026
6 min read

# CHAPTER 14

Clustering with K-Means

1. Introduction

Every algorithm we have learned so far—Regression, SVM, Trees—has been Supervised Learning. We provided the data (X) AND the correct answers (y). But what happens when you don't have the answers? Imagine you have 10,000 customer records (Age and Income), and the marketing team asks you to "group them into distinct buyer profiles." You don't know what the profiles are yet. This is Unsupervised Learning. In this chapter, we will use the most popular unsupervised algorithm, K-Means, to automatically discover hidden structures in data.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Define Unsupervised Learning and Clustering.
  • Explain how the K-Means algorithm mathematically finds groups.
  • Implement KMeans clustering in Scikit-learn.
  • Determine the optimal number of clusters using the "Elbow Method".
  • Build a customer segmentation model.

3. What is Clustering?

Clustering is the task of dividing the dataset into groups, such that data points in the same group (a cluster) are more similar to each other than to data points in other groups.

4. How K-Means Works

The "K" in K-Means stands for the number of clusters you want the algorithm to find. If you set K=3:
  1. 1. Initialization: The algorithm drops 3 random "Centroids" (center points) onto the graph.
  1. 2. Assignment: It calculates the distance of every data point to these 3 centroids. Each data point is assigned to the cluster of its closest centroid.
  1. 3. Update: It looks at all the points in Cluster 1, calculates their exact mathematical middle, and moves Centroid 1 to that new middle.
  1. 4. Repeat: It repeats steps 2 and 3 until the centroids stop moving. The clusters are now locked in!

5. Mini Project: Customer Segmentation

Let's cluster a mock dataset of customers based on their Annual Income and Spending Score. Notice that there is no y (target) variable here!
python
12345678910111213141516171819202122232425
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# 1. Mock Data (Income in $k, Spending Score 1-100)
# Notice: No Labels!
X = np.array([
    [15, 39], [15, 81], [16, 6], [16, 77], [17, 40], # Low income group
    [60, 40], [62, 41], [63, 42], [64, 43], [65, 44], # Mid income group
    [90, 80], [92, 85], [95, 90], [98, 92], [100, 95] # High income, High spend
])

# 2. Scale the data (K-Means uses distance, so scaling is mandatory)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# 3. Initialize K-Means (Let's ask it to find 3 groups)
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)

# 4. Fit and Predict the clusters
# This returns an array like [0, 0, 1, 1, 2, 2] indicating which cluster each point belongs to.
clusters = kmeans.fit_predict(X_scaled)

print("Cluster assignments:", clusters)

6. The Elbow Method (Finding the right K)

In the example above, I chose K=3 because I knew the mock data had 3 groups. In reality, you don't know how many groups exist. How do you find the best K? We use the Elbow Method.
  1. 1. We run K-Means multiple times, from K=1 to K=10.
  1. 2. For each run, we calculate the Inertia (how tightly packed the clusters are. Lower is better).
  1. 3. We plot these inertia values on a graph. The line will look like an arm. The "Elbow" (where the line stops dropping drastically and flattens out) is the optimal K.
python
12345678
# Code to calculate Inertias for the Elbow plot
inertias = []
for k in range(1, 10):
    model = KMeans(n_clusters=k, random_state=42, n_init=10)
    model.fit(X_scaled)
    inertias.append(model.inertia_)

# In a Jupyter Notebook, you would plot this list using Matplotlib to find the 'elbow'.

7. Visualizing Clusters

If your data has only 2 or 3 features, you can easily plot the results using Matplotlib. You plot the X features on a scatter plot and color them based on the clusters array outputted by K-Means. This is incredibly powerful for presenting data to business stakeholders.

8. Common Mistakes

  • Forgetting to Scale: K-Means calculates Euclidean distance. If Income is in the $100,000s and Age is in the 20s, the algorithm will completely ignore Age. You must use StandardScaler.
  • Assuming K-Means always finds the best clusters: K-Means forces spherical clusters. If your data forms complex shapes (like a crescent moon), K-Means will fail. You would need algorithms like DBSCAN.

9. Best Practices

  • Random Initialization Trap: Sometimes, the initial random placement of centroids causes poor clusters. Scikit-learn handles this by setting n_init=10 (default), which runs the algorithm 10 times with different random starting points and picks the best outcome.

10. Exercises

  1. 1. Run K-Means on a dataset with K=1. What will the algorithm do? Where will the single centroid end up?
  1. 2. Explain the difference between K-Means (Unsupervised) and K-Nearest Neighbors (Supervised).

11. MCQ Quiz with Answers

Question 1

In the K-Means algorithm, what does "K" represent?

Question 2

Which technique is commonly used to determine the optimal value of K when you do not know how many groups exist in your data?

12. Interview Questions

  • Q: Explain how the K-Means algorithm iteratively updates its centroids until it converges.
  • Q: What is Unsupervised Learning, and give a real-world business example of where it would be used over Supervised Learning.

13. FAQs

Q: Can I use K-Means to predict outcomes for new data? A: Yes! Once K-Means has locked its centroids (using .fit()), you can pass new data to .predict(). It will measure the distance from the new data to the established centroids and assign it to the closest cluster.

14. Summary

K-Means allows us to find order in chaos. By utilizing unsupervised learning, we can uncover hidden patterns, group similar items, and segment users without ever needing historical answer keys. It is the ultimate tool for exploratory data analysis.

15. Next Chapter Recommendation

Clustering works great for 2 or 3 features, but what if your dataset has 100 features? Visualizing and processing that is impossible. In Chapter 15: Dimensionality Reduction with PCA, we will learn how to compress 100 features down to 2, while keeping all the important information intact.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·