CHAPTER 13
Intermediate
K-Nearest Neighbors (KNN)
Updated: May 16, 2026
6 min read
# CHAPTER 13
K-Nearest Neighbors (KNN)
1. Introduction
"Tell me who your friends are, and I will tell you who you are." This age-old proverb is the exact logic behind the K-Nearest Neighbors (KNN) algorithm. KNN is arguably the simplest machine learning algorithm to understand. It doesn't learn complex math equations or build flowcharts. It simply plots all the training data on a graph and looks at the points closest to a new data point to make a prediction. In this chapter, we will learn how KNN works and implement it to solve a classic classification problem.2. Learning Objectives
By the end of this chapter, you will be able to:- Explain the logic behind the K-Nearest Neighbors algorithm.
- Understand distance metrics (Euclidean vs. Manhattan).
- Choose the optimal value for 'K'.
-
Implement
KNeighborsClassifierin Scikit-learn.
- Classify the famous Iris flower dataset.
3. How KNN Works
Imagine a graph with red apples on the left and yellow bananas on the right. You drop a new, unknown yellow fruit onto the right side of the graph.- 1. The algorithm asks: "Who are my K closest neighbors?"
-
2.
If
K=3, it draws a circle around the unknown fruit until it captures the 3 closest data points.
- 3. If 2 neighbors are Bananas and 1 is an Apple, the algorithm takes a majority vote.
- 4. It predicts the unknown fruit is a Banana!
*KNN is considered a "Lazy Learner." It doesn't actually 'train' a model; it just memorizes the training data and does all the heavy calculating only when a prediction is requested.*
4. Distance Metrics
How does it know which points are "closest"? It calculates the physical distance between them using math formulas.- Euclidean Distance: The standard "straight line" distance between two points (like a crow flies). This is the default in Scikit-learn.
- Manhattan Distance: The distance if you had to walk along a grid (like walking city blocks in Manhattan). Useful for high-dimensional data.
5. Choosing the K Value
The value ofK (how many neighbors to look at) is a hyperparameter you must set.
- If K=1: The model looks at only the single closest point. This is highly sensitive to noise (outliers) and causes massive overfitting.
- If K=100: The model looks at 100 points. If you have 60 Bananas and 40 Apples total, it will just predict Banana every time. This causes underfitting.
- Best Practice: Choose an odd number (3, 5, 7) to prevent tie-votes. You typically find the best K by trying multiple values using Cross-Validation (Chapter 8).
6. Mini Project: Iris Flower Classification
The Iris dataset is the "Hello World" of Machine Learning. It contains measurements (petal length, sepal width) of 3 different species of Iris flowers. Let's build a KNN model to classify them.
python
7. Why Scaling is Mandatory for KNN
Just like SVM, KNN calculates geometric distances. If Feature A is measured in millimeters (1-10) and Feature B is measured in kilometers (1000-5000), the massive numbers in Feature B will mathematically overpower Feature A. The algorithm will completely ignore Feature A. Always useStandardScaler with KNN.
8. Common Mistakes
- Using an Even K Value: If K=4, and 2 neighbors are Class A and 2 neighbors are Class B, the vote is tied. Scikit-learn has to guess based on internal tie-breaking logic. Use odd numbers!
- Using KNN on massive datasets: Because KNN has to calculate the distance to *every single point in the training data* for every prediction, it becomes incredibly slow during the prediction phase if you have millions of rows.
9. Best Practices
- Use KNN as a baseline: Because it is so simple and requires almost no tuning, it is a great algorithm to run first to establish a baseline accuracy before trying complex models like Random Forests.
10. Exercises
-
1.
Change the
n_neighborsparameter in the Iris code above to1and then to100. How does the accuracy change?
- 2. Explain why KNN is referred to as a "Lazy Learner" compared to Logistic Regression.
11. MCQ Quiz with Answers
Question 1
What does the "K" in KNN stand for?
Question 2
Why is it recommended to choose an odd number for K (e.g., K=3, 5, 7) in binary classification?
12. Interview Questions
- Q: Explain why Feature Scaling is absolutely critical before using the K-Nearest Neighbors algorithm.
- Q: How do you determine the optimal value for K in a real-world project?
13. FAQs
Q: Can KNN be used for Regression (predicting continuous numbers)? A: Yes! UsingKNeighborsRegressor. Instead of taking a majority vote of the neighbors' categories, it calculates the *average* of the neighbors' numerical values.