Skip to main content
Scikit-learn Basics
CHAPTER 11 Intermediate

Decision Trees and Random Forests

Updated: May 16, 2026
6 min read

# CHAPTER 11

Decision Trees and Random Forests

1. Introduction

While Logistic Regression draws mathematical lines to separate data, human brains often solve problems using rules: "If the sky is dark, AND I don't have an umbrella, THEN I will stay inside." Machine learning algorithms can mimic this exact logic. A Decision Tree creates a flowchart of rules based on the data. However, a single tree can easily jump to wild conclusions. To fix this, we plant a "Forest" of trees. In this chapter, we will master tree-based algorithms and introduce the concept of Ensemble Learning.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Explain how a Decision Tree splits data.
  • Understand the danger of tree depth and overfitting.
  • Implement DecisionTreeClassifier in Scikit-learn.
  • Explain the concept of Ensemble Learning.
  • Build a highly accurate RandomForestClassifier.

3. How a Decision Tree Works

A Decision Tree asks a series of True/False questions to split the data.
  1. 1. Root Node: The algorithm scans all features and finds the one rule that best splits the data into pure groups (e.g., "Is Age > 30?").
  1. 2. Branches: The data flows down the True or False path.
  1. 3. Internal Nodes: It asks another question (e.g., "Is Salary > $50k?").
  1. 4. Leaf Nodes: The final prediction.

4. The Overfitting Problem (Tree Depth)

Decision Trees have one massive flaw: they love to overfit. If you do not stop the tree from growing, it will ask 100 questions until every single data point has its own leaf. It essentially memorizes the training data perfectly but fails entirely on new test data. The Fix: We must limit the max_depth of the tree.

5. Implementing a Decision Tree

python
12345678910111213141516
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load sample dataset
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the Tree (Limiting depth to prevent overfitting)
tree_model = DecisionTreeClassifier(max_depth=3, random_state=42)
tree_model.fit(X_train, y_train)

# Predict
predictions = tree_model.predict(X_test)
print(f"Decision Tree Accuracy: {accuracy_score(y_test, predictions):.2f}")

6. The Solution: Random Forests and Ensemble Learning

Instead of relying on one deep, overfitted tree, what if we grew 100 shallow trees? Ensemble Learning is the strategy of combining multiple weak models to create one incredibly strong model.

A Random Forest works by:

  1. 1. Creating 100 different Decision Trees.
  1. 2. Giving each tree a slightly different, random subset of the training data.
  1. 3. Giving each tree only a random subset of the features (e.g., Tree 1 only looks at Age and Income, Tree 2 only looks at Location and Credit Score).
  1. 4. Voting: When new data arrives, all 100 trees make a prediction. The forest counts the votes and outputs the majority winner.

7. Mini Project: Customer Churn Prediction

Let's implement a Random Forest. Notice how similar the Scikit-learn code is to everything else we've done!
python
1234567891011
from sklearn.ensemble import RandomForestClassifier

# Initialize the Forest with 100 trees
forest_model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)

# Train the Forest
forest_model.fit(X_train, y_train)

# Make Predictions
forest_preds = forest_model.predict(X_test)
print(f"Random Forest Accuracy: {accuracy_score(y_test, forest_preds):.2f}")

*In almost all real-world scenarios, a Random Forest will significantly outperform a single Decision Tree.*

8. Feature Importance

One of the best features of Random Forests is that they automatically calculate which inputs were the most useful across all the trees.
python
12345
# Get the importance score of each feature
importances = forest_model.feature_importances_

# Print the top score
print(f"Most important feature score: {max(importances):.3f}")

*Business Use Case:* You can tell the marketing team exactly which user behaviors most strongly indicate that a customer is about to cancel their subscription (Churn).

9. Common Mistakes

  • Scaling Features for Trees: Unlike Logistic Regression, SVM, or KNN, Tree-based models do not care about the scale of the features. You do not need to use StandardScaler. They just look for split points (e.g., > 50000).
  • Ignoring n_estimators: The default is usually 100 trees. If your dataset is massive, increasing this to 500 might improve accuracy, but it will make training 5x slower.

10. Best Practices

  • Default to Random Forest: For tabular (Excel-style) data, a Random Forest (or its cousin, XGBoost) is widely considered the best algorithm to try first. It is highly accurate and requires very little tuning.

11. Exercises

  1. 1. What parameter in DecisionTreeClassifier is used to prevent the tree from memorizing the data (overfitting)?
  1. 2. Why is a Random Forest called "Random"? (Hint: Think about what data and features each tree receives).

12. MCQ Quiz with Answers

Question 1

What is the primary weakness of a single Decision Tree with no constraints?

Question 2

How does a Random Forest Classifier make its final prediction?

13. Interview Questions

  • Q: Explain the concept of Ensemble Learning using a real-world analogy.
  • Q: Do you need to apply Feature Scaling (Standardization) to your data before feeding it into a Random Forest? Why or why not?

14. FAQs

Q: Can Trees be used for Regression (predicting numbers) too? A: Yes! Scikit-learn provides DecisionTreeRegressor and RandomForestRegressor. Instead of voting on a category, the forest averages the numeric predictions of all the trees.

15. Summary

Decision Trees offer highly interpretable, rule-based logic that requires no data scaling. By addressing their tendency to overfit through the power of Ensemble Learning, Random Forests emerge as one of the most robust, accurate, and popular algorithms in the modern Data Scientist's toolkit.

16. Next Chapter Recommendation

While Forests draw boxes around data, our next algorithm looks for the optimal "street" separating categories. In Chapter 12: Support Vector Machines (SVM), we will explore an incredibly powerful algorithm built on complex geometry.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·