Skip to main content
Scikit-learn Basics
CHAPTER 17 Intermediate

Hyperparameter Tuning and GridSearchCV

Updated: May 16, 2026
5 min read

# CHAPTER 17

Hyperparameter Tuning and GridSearchCV

1. Introduction

When you instantiate a model like RandomForestClassifier(), Scikit-learn uses default settings (e.g., 100 trees, no maximum depth). While these defaults are good, they are rarely optimal for your specific dataset. These internal settings are called Hyperparameters. Adjusting them is like a DJ turning the knobs on a mixing board to get the perfect sound. In this chapter, we will learn how to systematically test hundreds of different knob combinations to squeeze every ounce of accuracy out of our models.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Define Hyperparameters vs. Parameters.
  • Understand the concept of Grid Search.
  • Implement GridSearchCV in Scikit-learn.
  • Implement RandomizedSearchCV for faster tuning.
  • Optimize a model to prevent overfitting.

3. Parameters vs. Hyperparameters

  • Parameters: These are the numbers the model *learns* during training. (e.g., The slope and intercept in Linear Regression). You cannot set these.
  • Hyperparameters: These are the settings *you* provide before training begins. (e.g., K=5 in KNN, maxdepth=3 in Decision Trees).
If a Random Forest has two hyperparameters you want to tune:
  • nestimators (Number of trees): [50, 100, 200]
  • max_depth (Depth of trees): [5, 10, None]

Grid Search creates a literal "grid" of all possible combinations (50 & 5, 50 & 10, 50 & None, 100 & 5, etc.). It trains a separate model for *every single combination* using Cross-Validation, evaluates them, and tells you which specific combination produced the highest accuracy.

5. Implementing GridSearchCV

Let's optimize a Random Forest model.
python
123456789101112131415161718192021222324252627
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_breast_cancer

X, y = load_breast_cancer(return_X_y=True)

# 1. Define the model
rf = RandomForestClassifier(random_state=42)

# 2. Define the grid of hyperparameters to test
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [3, 5, 10],
    'min_samples_split': [2, 5]
}

# 3. Initialize GridSearchCV
# cv=5 means 5-fold cross validation. 
# Total models trained: 3 * 3 * 2 * 5 = 90 models!
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5, scoring='accuracy', n_jobs=-1)

# 4. Run the search
grid_search.fit(X, y)

# 5. View the best results!
print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best Accuracy: {grid_search.best_score_ * 100:.2f}%")

*Note: n_jobs=-1 tells Scikit-learn to use all the cores on your CPU to run the grid search in parallel, making it much faster!*

6. Using the Best Model

You don't need to manually re-create the model with the best parameters. GridSearchCV automatically saves the best model for you.
python
12
best_rf_model = grid_search.best_estimator_
# You can immediately use best_rf_model.predict(X_test)

7. RandomizedSearchCV (The Faster Alternative)

If you have 10 hyperparameters with 10 values each, Grid Search will try to train 10 billion models. Your computer will melt. RandomizedSearchCV solves this. Instead of trying *every* combination, it tries a random sample of combinations (e.g., exactly 50 random combinations). Statistically, it almost always finds a set of hyperparameters that is 99% as good as Grid Search, but in a fraction of the time.
python
1234
from sklearn.model_selection import RandomizedSearchCV

random_search = RandomizedSearchCV(estimator=rf, param_distributions=param_grid, n_iter=10, cv=5, random_state=42)
random_search.fit(X, y)

8. Common Mistakes

  • Leaking Data into the Grid Search: You should only pass your Training data (Xtrain, ytrain) into the Grid Search. If you pass the entire dataset, you are optimizing the hyperparameters to perfectly fit the Test data, defeating the purpose of a blind test.
  • Grids that are too large: Trying nestimators: [1, 2, 3... up to 1000]. Be strategic. Try large steps first [10, 100, 1000]. If 100 wins, do a smaller grid around it [80, 100, 120].

9. Best Practices

  • Optimize for the right metric: By default, GridSearchCV optimizes for "Accuracy". If you are working with imbalanced data, change the parameter to scoring='f1' or scoring='rocauc'.

10. Exercises

  1. 1. What is the difference between a Parameter and a Hyperparameter in Scikit-learn?
  1. 2. If paramgrid has 4 values for C, 3 values for gamma, and cv=5, exactly how many models will GridSearchCV train?

11. MCQ Quiz with Answers

Question 1

What is the primary purpose of GridSearchCV?

Question 2

If GridSearchCV is taking too long to run because the parameter grid is massive, which Scikit-learn function is the best alternative?

12. Interview Questions

  • Q: Explain the difference between GridSearchCV and RandomizedSearchCV. When would you use one over the other?
  • Q: Explain why tuning hyperparameters on the Test dataset is considered a bad practice.

13. FAQs

Q: How do I know which hyperparameters exist for a specific algorithm? A: Check the Scikit-learn documentation! Every algorithm's page lists all its hyperparameters, what they do, and their default values.

14. Summary

Hyperparameter tuning is the final polish on a machine learning model. By using GridSearchCV and RandomizedSearchCV, we remove the guesswork from tuning and programmatically guarantee that our models are operating at their peak potential before we deploy them.

15. Next Chapter Recommendation

Currently, if we want to predict a new data point, we have to manually Impute, then Scale, then Encode, and *then* Predict. That is messy and prone to errors. In Chapter 18: Building ML Pipelines in Scikit-learn, we will learn how to bundle all these steps into a single, elegant object.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·