CHAPTER 12
Intermediate
Support Vector Machines (SVM)
Updated: May 16, 2026
6 min read
# CHAPTER 12
Support Vector Machines (SVM)
1. Introduction
When classifying data, there are infinite ways to draw a line separating "Class 0" from "Class 1". Logistic regression draws a line based on probability. A Support Vector Machine (SVM) takes a completely different, geometric approach. It tries to draw a line that not only separates the classes but maximizes the empty space (the margin) between them. In this chapter, we will explore the powerful math and geometry behind SVMs and how to implement them in Scikit-learn.2. Learning Objectives
By the end of this chapter, you will be able to:- Understand the geometric concept of Hyperplanes and Margins.
- Explain what Support Vectors are.
- Understand how the "Kernel Trick" solves non-linear problems.
-
Implement
SVC(Support Vector Classification) in Scikit-learn.
-
Tune the
Candgammahyperparameters.
3. Hyperplanes and Margins
Imagine red dots and blue dots plotted on a 2D graph.- Hyperplane: The line we draw to separate the red dots from the blue dots. (In 3D space, this line becomes a flat sheet of paper. In N-dimensional space, it's called a Hyperplane).
- The Margin: SVM doesn't just want any line; it wants the "widest street" possible. It tries to draw the line so that the distance from the line to the nearest red dot and the nearest blue dot is as large as possible.
4. What are Support Vectors?
The algorithm doesn't care about the dots far away from the boundary line. It only cares about the dots that are closest to the line—the ones sitting right on the edge of the "street." These critical data points are called Support Vectors. If you remove all other data points, the SVM line wouldn't change.5. The Kernel Trick (Non-Linear Data)
What if the red dots are clustered in the middle, and the blue dots form a ring completely surrounding them? You cannot draw a straight line through them! SVM solves this using the Kernel Trick. It mathematically projects the 2D data into a 3D space, warping the graph so that a flat plane can slide between the red and blue dots. When projected back down to 2D, the straight plane becomes a curved circle separating the classes perfectly.6. Implementing SVM in Scikit-learn
We useSVC (Support Vector Classifier). (For regression, there is SVR).
python
7. Tuning Hyperparameters (C and Gamma)
SVMs are notoriously sensitive to their settings.-
C (Regularization): Controls the strictness of the margin. A low
Callows some misclassifications for a wider, more general margin (prevents overfitting). A highCstrictly classifies every training point perfectly, resulting in a narrow, wiggly margin (high risk of overfitting).
- Gamma: Defines how far the influence of a single training example reaches. Low gamma means "far reach" (smoother boundaries). High gamma means "close reach" (tight, island-like boundaries around individual points).
8. Common Mistakes
-
Failing to Scale Features: Because SVM calculates physical geometric distances between data points, unscaled data will absolutely ruin an SVM model. You must use
StandardScalerorMinMaxScaler.
-
Using SVM on massive datasets: The math behind SVM is complex (
O(n^2)orO(n^3)time complexity). Training an SVM on 1 million rows can take days. For massive datasets, Random Forests or Neural Networks are preferred.
9. Best Practices
-
Start with RBF: Scikit-learn offers
kernel='linear',kernel='poly', andkernel='rbf'. Always start with RBF (Radial Basis Function). It maps data to infinite dimensions and handles almost all non-linear relationships gracefully.
10. Exercises
- 1. In your own words, describe what the "Kernel Trick" does when data cannot be separated by a straight line.
-
2.
If your SVM model is overfitting the training data perfectly, should you increase or decrease the
Cparameter?
11. MCQ Quiz with Answers
Question 1
In an SVM model, what are the "Support Vectors"?
Question 2
Why is Feature Scaling mandatory before training an SVM?
12. Interview Questions
-
Q: Explain the difference between
kernel='linear'andkernel='rbf'in an SVM.
-
Q: How does the
Cparameter balance the trade-off between margin width and classification errors in the training set?