Logistic Regression for Classification
# CHAPTER 10
Logistic Regression for Classification
1. Introduction
In the previous chapter, we used Linear Regression to predict a continuous number (House Price). But what if our target is a category? What if we want to predict if an email is "Spam" or "Not Spam"? If a patient has a disease "Yes" or "No"? This is known as Classification. Despite its confusing name, Logistic Regression is the foundational algorithm used for classification in Machine Learning. In this chapter, we will build models that can make decisions.2. Learning Objectives
By the end of this chapter, you will be able to:- Differentiate between Regression and Classification tasks.
- Understand the concept of Binary Classification.
- Explain how the Sigmoid function outputs probabilities.
-
Implement
LogisticRegressionin Scikit-learn.
-
Use
predict_probato analyze model confidence.
3. Classification Basics
Classification aims to draw a boundary (a decision boundary) between different classes of data.- Binary Classification: Predicting exactly two classes (0 or 1, Spam or Not Spam, Yes or No).
- Multi-class Classification: Predicting three or more classes (Apple, Banana, or Orange).
4. Why Not Linear Regression?
If you try to fit a straight line (Linear Regression) to binary data (values of 0 and 1), the line will shoot off to infinity. The model might predict a value of3.5 or -1.2, which makes no sense if the only valid answers are 0 and 1.
Logistic Regression solves this by wrapping the straight line inside an S-shaped curve called the Sigmoid Function.
5. The Sigmoid Function
The Sigmoid function takes any mathematical output (from negative infinity to positive infinity) and squashes it into a number exactly between 0.0 and 1.0. Because the output is between 0 and 1, we can interpret it as a Probability.-
If the output is
0.85, the model is 85% confident the email is Spam.
-
If the output is
0.10, the model is 10% confident the email is Spam (meaning it is likely Not Spam).
*Threshold:* By default, Scikit-learn sets a threshold of 0.5. Anything above 0.5 is classified as Class 1. Anything below is Class 0.
6. Mini Project: Spam Detection Model
Let's implement Logistic Regression to predict if an email is Spam based on two engineered features: "Number of Links" and "Number of Spelling Mistakes".7. Probability Predictions
The.predict() method forces the output into a strict 0 or 1. However, in business applications, knowing the *confidence* of the prediction is vital. We use .predict_proba() for this.
*Business Use Case:* If the model says an email is 51% likely to be spam, you might send it to the Inbox anyway because false positives annoy users. You might manually set a threshold so it only goes to the Spam folder if the probability is > 0.90.
8. Common Mistakes
-
Forgetting to Scale: Logistic Regression uses an optimizer underneath that solves mathematical gradients. Unscaled data (like a column with values in the millions next to a column with 0s and 1s) will cause the algorithm to converge slowly or fail. Always use
StandardScaler.
- Class Imbalance: If your dataset has 99,000 normal transactions and 1,000 fraudulent ones, the model can achieve 99% accuracy by simply predicting "Normal" every time. We will address evaluating imbalanced data in Chapter 16.
9. Best Practices
-
Interpretability: Just like Linear Regression, you can check
model.coefin Logistic Regression. This tells you which feature had the highest impact on classifying the data into Class 1.
10. Exercises
-
1.
If the Sigmoid function outputs
0.30, and the default threshold is0.5, which class (0 or 1) will Scikit-learn predict?
- 2. Write the Scikit-learn method used to view the exact probabilities of a prediction rather than just the final class.
11. MCQ Quiz with Answers
Despite its name, Logistic Regression is used for what type of Machine Learning task?
What is the mathematical function that squashes the output of Logistic Regression into a probability between 0 and 1?
12. Interview Questions
- Q: Explain why Linear Regression cannot be used effectively for binary classification problems.
-
Q: In a real-world scenario like medical diagnosis, why might you use
.predictproba()to check the raw probability instead of relying on.predict()?