Skip to main content
Scikit-learn Basics
CHAPTER 03 Intermediate

Python Basics for Machine Learning

Updated: May 16, 2026
6 min read

# CHAPTER 3

Python Basics for Machine Learning

1. Introduction

Machine learning relies heavily on math, but you don't need a calculator to do it. You need a programming language. Python is the undisputed king of Machine Learning. Its simple syntax allows you to focus on the data and the algorithms rather than worrying about complex memory management or compiling code. In this chapter, we will cover the essential Python concepts—variables, data structures, loops, and functions—that you will use every day as an ML engineer.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Define variables and understand basic Python data types.
  • Use Lists and Dictionaries to store collections of data.
  • Write conditional statements (if/else) to control logic flow.
  • Use loops (for/while) to iterate over data.
  • Define custom functions to make code reusable.

3. Variables and Data Types

In Python, you do not need to declare a variable's type. You simply assign a value to a name.
python
1234567891011121314
# Integer (whole number)
age = 25

# Float (decimal number)
learning_rate = 0.01

# String (text)
model_name = "Random Forest"

# Boolean (True/False)
is_trained = False

# Checking types
print(type(learning_rate)) # Output: <class 'float'>

4. Data Structures: Lists

In Machine Learning, you rarely work with a single number. You work with arrays (lists) of numbers. A Python list holds an ordered collection of items.
python
1234567891011121314
# Creating a list of house prices
prices = [250000, 300000, 450000, 500000]

# Accessing elements (0-indexed)
print(prices[0]) # Output: 250000

# Slicing (getting a subset)
print(prices[1:3]) # Output: [300000, 450000]

# Adding an element
prices.append(600000)

# Finding the length
print(len(prices)) # Output: 5

5. Data Structures: Dictionaries

Dictionaries store data in Key-Value pairs. They are incredibly useful for storing model parameters or mapping categories.
python
1234567891011121314
# Creating a dictionary for model configurations
hyperparameters = {
    "n_estimators": 100,
    "max_depth": 5,
    "criterion": "gini"
}

# Accessing a value by its key
print(hyperparameters["max_depth"]) # Output: 5

# Adding a new key-value pair
hyperparameters["learning_rate"] = 0.1

print(hyperparameters)

6. Conditions (If / Else)

Decision-making is crucial. If the model's accuracy drops, we might want to stop training.
python
12345678910
accuracy = 0.85

if accuracy >= 0.90:
    print("Model is excellent! Deploying...")
elif accuracy >= 0.80:
    print("Model is acceptable. Needs more tuning.")
else:
    print("Model is poor. Retrain with more data.")
    
# Output: Model is acceptable. Needs more tuning.

7. Loops (For and While)

Loops allow us to execute code multiple times. for loops are used extensively to iterate through lists of data or epochs (training cycles).
python
12345678910111213
# Iterating over a list
features = ["age", "salary", "location"]
for feature in features:
    print(f"Processing feature: {feature}")

# Output:
# Processing feature: age
# Processing feature: salary
# Processing feature: location

# Using range for numerical loops (e.g., training for 3 epochs)
for epoch in range(1, 4):
    print(f"Training epoch {epoch}...")

8. List Comprehensions (Pythonic Data Transformation)

A highly popular feature in Python that allows you to create a new list from an existing one in a single line of code. Data scientists use this constantly to clean data.
python
123456789101112
# Let's say we have raw string data and want to convert to integers
raw_data = ["10", "20", "30"]

# Standard way:
clean_data = []
for item in raw_data:
    clean_data.append(int(item))

# Pythonic List Comprehension way:
clean_data = [int(item) for item in raw_data]

print(clean_data) # Output: [10, 20, 30]

9. Functions for Data Science

Functions wrap code into reusable blocks. Instead of writing the math to calculate Mean Squared Error 10 times, write a function once.
python
12345678910111213
def calculate_accuracy(correct_predictions, total_predictions):
    """
    Calculates the accuracy percentage.
    """
    if total_predictions == 0:
        return 0.0
    
    accuracy = (correct_predictions / total_predictions) * 100
    return accuracy

# Using the function
result = calculate_accuracy(85, 100)
print(f"Accuracy: {result}%") # Output: Accuracy: 85.0%

10. Common Mistakes

  • Indentation Errors: Unlike C++ or Java, Python uses whitespace (indentation) to define blocks of code. If your if statement block isn't indented properly, Python will crash.
  • Modifying a list while iterating over it: Removing items from a list while inside a for item in list: loop causes unexpected behavior. Use a list comprehension instead.

11. Best Practices

  • Use meaningful variable names: Don't use x1 and y2. Use housearea and sellingprice. Code is read more often than it is written.
  • Docstrings: Always use """ triple quotes """ right under your function definition to explain what the function does, what arguments it takes, and what it returns.

12. Exercises

  1. 1. Create a function called celsiustofahrenheit that takes a temperature in Celsius and returns it in Fahrenheit. Formula: (C * 9/5) + 32.
  1. 2. Write a list comprehension that takes a list of numbers [1, 2, 3, 4, 5] and generates a new list with the square of those numbers [1, 4, 9, 16, 25].

13. MCQ Quiz with Answers

Question 1

How do you extract the value "apple" from the dictionary: data = {"fruit": "apple", "color": "red"}?

Question 2

Which of the following is a correctly formatted Python list comprehension?

14. Interview Questions

  • Q: Explain the difference between a Python List and a Dictionary. When would you use one over the other?
  • Q: What happens if you try to access an index in a list that doesn't exist (e.g., index 10 in a list of 5 items)?

15. FAQs

Q: I know Object-Oriented Programming (Classes). Do I need to use it in ML? A: Yes and no. Scikit-learn itself is built using Classes, and you will instantiate objects from it (e.g., model = LinearRegression()). However, as a beginner writing ML scripts, you will mostly write functional code and rarely need to define your own complex custom Classes.

16. Summary

Python's readability makes it the perfect language for data science. By mastering variables, lists, dictionaries, conditionals, loops, and functions, you now possess the syntax required to manipulate data and instruct machine learning algorithms.

17. Next Chapter Recommendation

Standard Python lists are great, but they are too slow for processing millions of rows of data. In Chapter 4: NumPy and Pandas Essentials, we will introduce the two libraries that turbo-charge Python for heavy data manipulation.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·