Classification Algorithms
# CHAPTER 24
Classification Algorithms
1. Chapter Introduction
While Regression predicts numbers (e.g., $450,000), Classification predicts discrete categories. Will this customer Churn (Yes/No)? Is this email Spam or Not Spam? Is this image a Cat, Dog, or Bird? This chapter introduces the three most common classification algorithms in Scikit-Learn: Logistic Regression, K-Nearest Neighbors (KNN), and Decision Trees.2. Logistic Regression (Binary Classification)
Despite the word "Regression" in its name, Logistic Regression is used for *Classification*. It predicts the probability (from 0 to 1) that an item belongs to a specific class. If the probability is > 0.5, it predicts "Yes" (1). Otherwise, "No" (0).
3. K-Nearest Neighbors (KNN)
KNN is intuitive. It plots the new data point, looks at the 'K' closest historical data points, and takes a vote. If K=5, and 4 of the 5 closest points are "Cats", the model classifies the new point as a "Cat".
*Crucial:* KNN relies on distance. You MUST scale your data (StandardScaler) before using KNN, or features with large numbers will dominate the distance calculation.
4. Decision Trees
A Decision Tree works like a flowchart. It asks a series of True/False questions about the features to split the data until it reaches a conclusion. (e.g., *Is Income > 50k?* -> *Is Age > 30?* -> *Classify as 'Will Buy'*).
Decision Trees do *not* require feature scaling!
5. Multi-Class Classification
Classification isn't limited to Yes/No. If your y_train column contains 3 categories (0=Cat, 1=Dog, 2=Bird), Scikit-Learn algorithms automatically handle it. You write the exact same .fit() and .predict() code.
6. Mini Project: Spam Email Classifier
Let's simulate classifying emails based on two features: Word Count and Number of Links.
7. Common Mistakes
-
Not scaling data for KNN: If Feature A ranges from 0-1 and Feature B ranges from 0-1,000,000, KNN will completely ignore Feature A because the distance calculation is overwhelmed by B. Always use
StandardScaler.
-
Overfitting Decision Trees: If you don't set a
maxdepth, a Decision Tree will keep branching until every single training point is perfectly classified. It will score 100% on training data, and fail miserably on test data because it memorized the noise.
8. MCQs
Classification algorithms are used to predict what?
What does Logistic Regression predict under the hood?
Which algorithm classifies new data by taking a "vote" among its closest historical neighbors?
Which algorithm relies heavily on distance mathematics and MUST have its features scaled?
Which algorithm acts like a flowchart of True/False questions?
What metric is used to evaluate the percentage of correct guesses made by a classification model?
What is "Binary Classification"?
How do you prevent a Decision Tree from memorizing the training data (overfitting)?
Is Logistic Regression used for Regression or Classification?
If the target y column contains 5 different categories (e.g., Car brands), which algorithms can handle it?
9. Interview Questions
- Q: Explain how K-Nearest Neighbors (KNN) makes a prediction. Why is feature scaling absolutely mandatory for this algorithm?
- Q: Despite its name, what task is Logistic Regression actually used for?
11. Summary
Classification predicts categories. Use Logistic Regression for fast, probability-based binary predictions. Use K-Nearest Neighbors for distance-based voting (remembering to scale the data first). Use Decision Trees to build interpretable flowcharts (remembering to limitmaxdepth to prevent overfitting). No matter the algorithm, the Scikit-Learn .fit() and .predict() syntax remains identical.