Skip to main content
Regression Models
CHAPTER 02 Intermediate

Setting Up Python and Machine Learning Environment

Updated: May 16, 2026
6 min read

# CHAPTER 2

Setting Up Python and Machine Learning Environment

1. Introduction

Machine Learning requires a specific set of tools. You cannot build a predictive model using just Microsoft Word or an empty terminal. To analyze data, train models, and plot graphs, your computer needs a robust software stack. The phrase "It works on my machine" is a nightmare in AI development. In this chapter, we will walk through a bulletproof setup process to ensure your Python environment is stable, reproducible, and ready for regression modeling.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Install a compatible version of Python.
  • Create and manage Python Virtual Environments.
  • Install the core ML libraries (scikit-learn, numpy, pandas, matplotlib).
  • Set up Visual Studio Code (VS Code).
  • Configure and run Jupyter Notebooks.

3. Installing Python

Machine learning relies entirely on a 64-bit version of Python. *Warning: Stick to stable versions like Python 3.9, 3.10, or 3.11 to ensure compatibility with all data science libraries.*

Windows:

  1. 1. Go to python.org/downloads and download the Windows installer.
  1. 2. CRITICAL STEP: On the very first screen of the installer, check the box "Add Python to PATH". If you skip this, nothing will work in your terminal.

macOS:

  1. 1. Open terminal and install Homebrew: /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  1. 2. Run: brew install python

Linux (Ubuntu): sudo apt update sudo apt install python3 python3-pip python3-venv

4. Virtual Environments

Never install libraries globally on your main computer system. It will inevitably conflict with other Python projects. We use a Virtual Environment (an isolated sandbox folder).
  1. 1. Open your terminal/command prompt.
  1. 2. Create a folder for this course: mkdir regressioncourse and enter it: cd regressioncourse
  1. 3. Create the virtual environment (named mlenv):
  • Windows: python -m venv mlenv
  • Mac/Linux: python3 -m venv mlenv
  1. 4. Activate the environment:
  • Windows: mlenv\Scripts\activate
  • Mac/Linux: source mlenv/bin/activate
*(You should now see (mlenv) at the start of your terminal line).*

5. Installing the ML Stack (Scikit-Learn, Pandas)

With your virtual environment activated, we use pip (Python's package manager) to download the industry-standard libraries required for regression analysis.

Run the following command in your terminal:

bash
1
pip install scikit-learn numpy pandas matplotlib jupyter

*What are these?*

  • scikit-learn: The ultimate machine learning library (contains Linear Regression, Random Forests, etc.).
  • numpy: For fast matrix math.
  • pandas: For loading and cleaning Excel/CSV files.
  • matplotlib: For drawing graphs and charts.
  • jupyter: For writing interactive code.

6. VS Code and Jupyter Notebook Setup

Data Scientists rarely write code in standard .py files. They write exploratory code in Jupyter Notebooks (.ipynb), which allow you to run code in blocks and see graphs instantly inline.
  1. 1. Download Visual Studio Code from code.visualstudio.com.
  1. 2. Open VS Code, go to the Extensions tab (squares on the left menu).
  1. 3. Search for and install the Python extension and the Jupyter extension.
  1. 4. Open your regressioncourse folder in VS Code.
  1. 5. Create a new file named helloml.ipynb.
  1. 6. Open the file, click "Select Kernel" in the top right, and choose your ml_env environment.

7. Verify Installation

Let's ensure everything is installed correctly. Type this into the first cell of your Notebook and click Play (Shift+Enter):
python
123456789101112131415
import sklearn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

print(f"Scikit-Learn version: {sklearn.__version__}")
print(f"Pandas version: {pd.__version__}")

# Test plotting
x = np.array([1, 2, 3, 4])
y = np.array([10, 20, 30, 40])

plt.plot(x, y, marker='o', color='red')
plt.title("Environment Verification")
plt.show()

*If a graph with a red line appears, your computer is fully equipped to do Machine Learning!*

8. Common Mistakes

  • Forgetting to activate the environment: If you open a new terminal tomorrow and type import sklearn, it will fail unless you run mlenv\Scripts\activate first.
  • Using old tutorials: Machine learning libraries update frequently. If you copy-paste code from a 2015 blog post, it might fail. Always rely on the official scikit-learn documentation for the most up-to-date syntax.

9. Best Practices

  • Requirements File: When sharing your code, others need to know exactly what libraries you used. In your terminal, run pip freeze > requirements.txt. This creates a text file listing your exact environment.

10. Exercises

  1. 1. Create a new virtual environment named testenv, activate it, and install only numpy.
  1. 2. Open a Jupyter Notebook and write a block of code to print "Machine Learning Environment Ready".

11. MCQ Quiz with Answers

Question 1

Why must you use a Virtual Environment when setting up a Machine Learning project?

Question 2

Which library is considered the industry standard in Python for building traditional Machine Learning models like Linear Regression?

12. Interview Questions

  • Q: Explain the purpose of a requirements.txt file in a Python data science project.
  • Q: What is a Jupyter Notebook, and why is it preferred over a standard IDE for exploratory data analysis?

13. FAQs

Q: Do I need a powerful GPU to do Regression? A: No! Unlike Deep Learning (which requires GPUs), traditional Machine Learning algorithms in scikit-learn (like Linear Regression and Random Forests) are designed to run incredibly fast on standard laptop CPUs.

14. Summary

Setting up your environment correctly is a rite of passage in Data Science. By leveraging virtual environments, installing the core data science stack (numpy, pandas, scikit-learn), and utilizing the interactive power of Jupyter Notebooks, you have built a stable foundation for the rest of your career.

15. Next Chapter Recommendation

Before we load massive CSV files, we must ensure our foundational coding skills are sharp. In Chapter 3: Python Basics for Regression Analysis, we will review the exact Python syntax, data structures, and functions required to write clean analytic scripts.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·