Skip to main content
Scikit-learn Basics
CHAPTER 13 Intermediate

K-Nearest Neighbors (KNN)

Updated: May 16, 2026
6 min read

# CHAPTER 13

K-Nearest Neighbors (KNN)

1. Introduction

"Tell me who your friends are, and I will tell you who you are." This age-old proverb is the exact logic behind the K-Nearest Neighbors (KNN) algorithm. KNN is arguably the simplest machine learning algorithm to understand. It doesn't learn complex math equations or build flowcharts. It simply plots all the training data on a graph and looks at the points closest to a new data point to make a prediction. In this chapter, we will learn how KNN works and implement it to solve a classic classification problem.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Explain the logic behind the K-Nearest Neighbors algorithm.
  • Understand distance metrics (Euclidean vs. Manhattan).
  • Choose the optimal value for 'K'.
  • Implement KNeighborsClassifier in Scikit-learn.
  • Classify the famous Iris flower dataset.

3. How KNN Works

Imagine a graph with red apples on the left and yellow bananas on the right. You drop a new, unknown yellow fruit onto the right side of the graph.
  1. 1. The algorithm asks: "Who are my K closest neighbors?"
  1. 2. If K=3, it draws a circle around the unknown fruit until it captures the 3 closest data points.
  1. 3. If 2 neighbors are Bananas and 1 is an Apple, the algorithm takes a majority vote.
  1. 4. It predicts the unknown fruit is a Banana!

*KNN is considered a "Lazy Learner." It doesn't actually 'train' a model; it just memorizes the training data and does all the heavy calculating only when a prediction is requested.*

4. Distance Metrics

How does it know which points are "closest"? It calculates the physical distance between them using math formulas.
  • Euclidean Distance: The standard "straight line" distance between two points (like a crow flies). This is the default in Scikit-learn.
  • Manhattan Distance: The distance if you had to walk along a grid (like walking city blocks in Manhattan). Useful for high-dimensional data.

5. Choosing the K Value

The value of K (how many neighbors to look at) is a hyperparameter you must set.
  • If K=1: The model looks at only the single closest point. This is highly sensitive to noise (outliers) and causes massive overfitting.
  • If K=100: The model looks at 100 points. If you have 60 Bananas and 40 Apples total, it will just predict Banana every time. This causes underfitting.
  • Best Practice: Choose an odd number (3, 5, 7) to prevent tie-votes. You typically find the best K by trying multiple values using Cross-Validation (Chapter 8).

6. Mini Project: Iris Flower Classification

The Iris dataset is the "Hello World" of Machine Learning. It contains measurements (petal length, sepal width) of 3 different species of Iris flowers. Let's build a KNN model to classify them.
python
12345678910111213141516171819202122232425262728
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# 1. Load Data
iris = load_iris()
X = iris.data   # Features: Petal/Sepal measurements
y = iris.target # Labels: 0 (Setosa), 1 (Versicolor), 2 (Virginica)

# 2. Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 3. CRITICAL: Scale the Features!
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# 4. Initialize KNN (Let's use K=5)
knn = KNeighborsClassifier(n_neighbors=5)

# 5. "Train" (Memorize the data)
knn.fit(X_train_scaled, y_train)

# 6. Predict and Evaluate
predictions = knn.predict(X_test_scaled)
print(f"KNN Accuracy: {accuracy_score(y_test, predictions)*100:.2f}%")

7. Why Scaling is Mandatory for KNN

Just like SVM, KNN calculates geometric distances. If Feature A is measured in millimeters (1-10) and Feature B is measured in kilometers (1000-5000), the massive numbers in Feature B will mathematically overpower Feature A. The algorithm will completely ignore Feature A. Always use StandardScaler with KNN.

8. Common Mistakes

  • Using an Even K Value: If K=4, and 2 neighbors are Class A and 2 neighbors are Class B, the vote is tied. Scikit-learn has to guess based on internal tie-breaking logic. Use odd numbers!
  • Using KNN on massive datasets: Because KNN has to calculate the distance to *every single point in the training data* for every prediction, it becomes incredibly slow during the prediction phase if you have millions of rows.

9. Best Practices

  • Use KNN as a baseline: Because it is so simple and requires almost no tuning, it is a great algorithm to run first to establish a baseline accuracy before trying complex models like Random Forests.

10. Exercises

  1. 1. Change the n_neighbors parameter in the Iris code above to 1 and then to 100. How does the accuracy change?
  1. 2. Explain why KNN is referred to as a "Lazy Learner" compared to Logistic Regression.

11. MCQ Quiz with Answers

Question 1

What does the "K" in KNN stand for?

Question 2

Why is it recommended to choose an odd number for K (e.g., K=3, 5, 7) in binary classification?

12. Interview Questions

  • Q: Explain why Feature Scaling is absolutely critical before using the K-Nearest Neighbors algorithm.
  • Q: How do you determine the optimal value for K in a real-world project?

13. FAQs

Q: Can KNN be used for Regression (predicting continuous numbers)? A: Yes! Using KNeighborsRegressor. Instead of taking a majority vote of the neighbors' categories, it calculates the *average* of the neighbors' numerical values.

14. Summary

K-Nearest Neighbors proves that sometimes the simplest logic is highly effective. By classifying new data points based on the majority vote of their closest scaled neighbors, KNN offers an intuitive, easy-to-implement solution. However, its speed limitations on large datasets remind us to choose our tools carefully.

15. Next Chapter Recommendation

Up until now, every algorithm we've learned requires the answers (Labels) to train. What if we have a massive dataset of customers, but no labels, and we just want to group similar people together? In Chapter 14: Clustering with K-Means, we will enter the world of Unsupervised Learning.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·