Decision Trees and Random Forests
# CHAPTER 11
Decision Trees and Random Forests
1. Introduction
While Logistic Regression draws mathematical lines to separate data, human brains often solve problems using rules: "If the sky is dark, AND I don't have an umbrella, THEN I will stay inside." Machine learning algorithms can mimic this exact logic. A Decision Tree creates a flowchart of rules based on the data. However, a single tree can easily jump to wild conclusions. To fix this, we plant a "Forest" of trees. In this chapter, we will master tree-based algorithms and introduce the concept of Ensemble Learning.2. Learning Objectives
By the end of this chapter, you will be able to:- Explain how a Decision Tree splits data.
- Understand the danger of tree depth and overfitting.
-
Implement
DecisionTreeClassifierin Scikit-learn.
- Explain the concept of Ensemble Learning.
-
Build a highly accurate
RandomForestClassifier.
3. How a Decision Tree Works
A Decision Tree asks a series of True/False questions to split the data.- 1. Root Node: The algorithm scans all features and finds the one rule that best splits the data into pure groups (e.g., "Is Age > 30?").
- 2. Branches: The data flows down the True or False path.
- 3. Internal Nodes: It asks another question (e.g., "Is Salary > $50k?").
- 4. Leaf Nodes: The final prediction.
4. The Overfitting Problem (Tree Depth)
Decision Trees have one massive flaw: they love to overfit. If you do not stop the tree from growing, it will ask 100 questions until every single data point has its own leaf. It essentially memorizes the training data perfectly but fails entirely on new test data. The Fix: We must limit themax_depth of the tree.
5. Implementing a Decision Tree
6. The Solution: Random Forests and Ensemble Learning
Instead of relying on one deep, overfitted tree, what if we grew 100 shallow trees? Ensemble Learning is the strategy of combining multiple weak models to create one incredibly strong model.A Random Forest works by:
- 1. Creating 100 different Decision Trees.
- 2. Giving each tree a slightly different, random subset of the training data.
- 3. Giving each tree only a random subset of the features (e.g., Tree 1 only looks at Age and Income, Tree 2 only looks at Location and Credit Score).
- 4. Voting: When new data arrives, all 100 trees make a prediction. The forest counts the votes and outputs the majority winner.
7. Mini Project: Customer Churn Prediction
Let's implement a Random Forest. Notice how similar the Scikit-learn code is to everything else we've done!*In almost all real-world scenarios, a Random Forest will significantly outperform a single Decision Tree.*
8. Feature Importance
One of the best features of Random Forests is that they automatically calculate which inputs were the most useful across all the trees.*Business Use Case:* You can tell the marketing team exactly which user behaviors most strongly indicate that a customer is about to cancel their subscription (Churn).
9. Common Mistakes
-
Scaling Features for Trees: Unlike Logistic Regression, SVM, or KNN, Tree-based models do not care about the scale of the features. You do not need to use
StandardScaler. They just look for split points (e.g., > 50000).
-
Ignoring
n_estimators: The default is usually 100 trees. If your dataset is massive, increasing this to 500 might improve accuracy, but it will make training 5x slower.
10. Best Practices
- Default to Random Forest: For tabular (Excel-style) data, a Random Forest (or its cousin, XGBoost) is widely considered the best algorithm to try first. It is highly accurate and requires very little tuning.
11. Exercises
-
1.
What parameter in
DecisionTreeClassifieris used to prevent the tree from memorizing the data (overfitting)?
- 2. Why is a Random Forest called "Random"? (Hint: Think about what data and features each tree receives).
12. MCQ Quiz with Answers
What is the primary weakness of a single Decision Tree with no constraints?
How does a Random Forest Classifier make its final prediction?
13. Interview Questions
- Q: Explain the concept of Ensemble Learning using a real-world analogy.
- Q: Do you need to apply Feature Scaling (Standardization) to your data before feeding it into a Random Forest? Why or why not?
14. FAQs
Q: Can Trees be used for Regression (predicting numbers) too? A: Yes! Scikit-learn providesDecisionTreeRegressor and RandomForestRegressor. Instead of voting on a category, the forest averages the numeric predictions of all the trees.