Skip to main content
Classification Algorithms
CHAPTER 06 Intermediate

Logistic Regression for Classification

Updated: May 16, 2026
6 min read

# CHAPTER 6

Logistic Regression for Classification

1. Introduction

Despite its confusing name, Logistic Regression is NOT a regression algorithm; it is the most famous and foundational Classification algorithm in existence. It is the industry standard for Binary Classification (Yes/No problems). In this chapter, we will open the black box of scikit-learn, understand the mathematical Sigmoid curve that powers it, and build a model to detect Spam emails.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Explain why standard Linear Regression fails at classification.
  • Understand the mathematics of the Sigmoid Function.
  • Train a LogisticRegression model using scikit-learn.
  • Output probability scores using .predict_proba().
  • Understand Decision Thresholds.

3. Why Linear Regression Fails

Imagine plotting tumor sizes. Small tumors are Benign (0), large tumors are Malignant (1). If you draw a straight Linear Regression line through these points, the line will continue forever. It might predict a massive tumor has a value of 3.5. But classes must be 0 or 1. A prediction of 3.5 makes no sense! Furthermore, a straight line is highly sensitive to extreme outliers, which will drastically shift the decision boundary and ruin predictions.

4. The Math: The Sigmoid Function

To fix this, Logistic Regression takes the straight line ($y = mx + b$) and forces it through a mathematical filter called the Sigmoid Function. The Sigmoid function squashes any number (from negative infinity to positive infinity) into a strict range exactly between 0.0 and 1.0.

*The Result:* The algorithm no longer outputs nonsense like 3.5. It outputs a Probability. If the model outputs 0.85, it means it is 85% confident the item belongs to Class 1.

5. Decision Thresholds

Once the model calculates a probability (e.g., 0.85), how does it make a final hard decision? It uses a Threshold. By default, Scikit-learn sets the threshold at 0.50 (50%).
  • If Probability $\ge 0.50 \rightarrow$ Predict Class 1.
  • If Probability $< 0.50 \rightarrow$ Predict Class 0.

6. Mini Project: Email Spam Detection

Let's build a Logistic Regression model to predict if an email is Spam (1) or Safe (0) based on the number of links and the length of the email.
python
12345678910111213141516171819202122232425262728
import numpy as np
from sklearn.linear_model import LogisticRegression

# 1. Provide the Data
# Features (X): [Number of Links, Email Length in words]
X_train = np.array([
    [1, 50],   # Safe
    [0, 200],  # Safe
    [5, 30],   # Spam (Lots of links, short text)
    [8, 20]    # Spam
])

# Labels (y): 0 = Safe, 1 = Spam
y_train = np.array([0, 0, 1, 1])

# 2. Initialize the Model
model = LogisticRegression()

# 3. Train the Model
model.fit(X_train, y_train)

# 4. Make a Prediction!
# New email arrives: 6 links, 25 words long
X_test = np.array([[6, 25]])

# Get the hard class prediction (0 or 1)
prediction = model.predict(X_test)
print(f"Predicted Class: {prediction[0]}") # Output: 1 (Spam!)

7. Extracting Probabilities (predictproba)

In business, you rarely want just a hard "1" or "0". You want to know *how confident* the AI is before acting. We use .predictproba() to see the raw Sigmoid percentages.
python
12345678910
# Extract the raw probabilities
probabilities = model.predict_proba(X_test)

# predict_proba returns an array with probabilities for BOTH classes: [Prob of 0, Prob of 1]
print(f"Probability of Safe (Class 0): {probabilities[0][0]*100:.1f}%")
print(f"Probability of Spam (Class 1): {probabilities[0][1]*100:.1f}%")

# Example Output:
# Probability of Safe (Class 0): 12.5%
# Probability of Spam (Class 1): 87.5%

*Because 87.5% is greater than the 50% threshold, the model returned Class 1!*

8. Common Mistakes

  • Assuming Logistic Regression draws curved boundaries: Despite using the curved Sigmoid function for probabilities, the actual physical Decision Boundary that Logistic Regression draws through the data is a perfectly straight line. If your data cannot be separated by a straight line, Logistic Regression will underfit.
  • Ignoring the Threshold: In a medical scenario (detecting cancer), you don't want to wait until the model is 50% sure. You might want to flag the patient if the model is even 15% sure! You can manually extract the probabilities using predictproba and write your own custom if prob > 0.15 logic to override the default 50% threshold.

9. Best Practices

  • Feature Scaling: Logistic Regression uses an internal optimizer to find the best boundary. If your features are on vastly different scales (e.g., Links: 1-10, Words: 100-5000), the optimizer will struggle. Always use a StandardScaler (covered in Chapter 13).

10. Exercises

  1. 1. If .predictproba() outputs [0.30, 0.70], what hard class will .predict() output, assuming the default threshold?
  1. 2. Why is a standard linear line ($y = mx+b$) mathematically inappropriate for predicting binary classes like 0 or 1?

11. MCQ Quiz with Answers

Question 1

Despite its name, Logistic Regression is used for what type of Machine Learning task?

Question 2

What is the mathematical purpose of the Sigmoid function in Logistic Regression?

12. Interview Questions

  • Q: Explain how a Logistic Regression model utilizes a Threshold to convert its raw mathematical output into a final class prediction.
  • Q: In what specific business scenario would you manually lower the decision threshold of a Logistic Regression model from 0.50 to 0.10?

13. FAQs

Q: Can Logistic Regression handle Multiclass problems (e.g., Cat, Dog, Horse)? A: Yes! By default, Scikit-learn's LogisticRegression handles multiclass classification by using a strategy called "One-vs-Rest" or "Multinomial," extending the math to output probabilities that sum to 100% across all classes.

14. Summary

Logistic Regression is the undisputed king of baseline classification. By forcing linear algebra through the Sigmoid function, it generates highly interpretable probability scores. Because it provides not just a hard prediction, but a percentage of *confidence*, it allows businesses to set custom risk thresholds, making it invaluable in production.

15. Next Chapter Recommendation

Logistic Regression is powerful, but what if our data cannot be separated by a straight line? What if we want an algorithm that makes decisions purely based on "who its neighbors are"? In Chapter 7: K-Nearest Neighbors (KNN), we will explore an incredibly intuitive, non-linear classification algorithm.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·