CHAPTER 04
Intermediate
NumPy, Pandas, and Data Handling
Updated: May 16, 2026
6 min read
# CHAPTER 4
NumPy, Pandas, and Data Handling
1. Introduction
A neural network is essentially a giant mathematical blender. It takes numbers in, multiplies them by other numbers, and spits a number out. It cannot read Excel files, and it cannot process standard Python lists efficiently. To prepare data for PyTorch, we must use specialized scientific libraries. NumPy provides blazing-fast multidimensional arrays for matrix math, while Pandas acts as a programmable spreadsheet to clean and organize real-world data. In this chapter, we will master data handling.2. Learning Objectives
By the end of this chapter, you will be able to:-
Create and manipulate NumPy
ndarrays.
- Understand the shape and dimensions of data matrices.
- Load datasets (like CSVs) using Pandas DataFrames.
- Preprocess, filter, and clean missing data.
- Transition data from Pandas to PyTorch.
3. NumPy Basics and NDArrays
NumPy (Numerical Python) is written in C. It introduces thendarray (N-Dimensional Array), which is magnitudes faster than a standard Python list.
python
*Understanding .shape is critical. 90% of the errors beginners face in PyTorch are "Shape Mismatch" errors!*
4. Vectorized Operations
In standard Python, if you want to multiply every item in a list by 5, you have to write afor loop. In NumPy, you use Vectorization, which applies the math instantly to the entire array using C-backend acceleration.
python
5. Pandas Basics and DataFrames
While NumPy is great for pure math, it doesn't handle column names or missing text values well. Pandas is the ultimate data wrangling tool. Its core object is the DataFrame (a 2D table).
python
6. Reading CSV Files
In reality, you will load data from external files, usually CSVs downloaded from Kaggle or your company's database.
python
7. Data Preprocessing with Pandas
Neural networks hate missing data (NaN values). We must use Pandas to clean the data before feeding it to PyTorch.
python
8. Mini Project: Dataset Exploration Project
Let's extract exactly what a neural network needs: The Features (Inputs/X) and the Label (Output/y).
python
*In Chapter 6, we will learn how to turn this Xarray directly into a PyTorch Tensor!*
9. Common Mistakes
-
Confusing Pandas indexing: Trying to select rows using
df[0]. In Pandas, you must usedf.iloc[0]for integer-location based indexing.
-
Feeding Pandas DataFrames directly into PyTorch: PyTorch layers do not understand Pandas DataFrames. You must always convert your DataFrame to a NumPy array (
.tonumpy()) and then to a PyTorch Tensor before training.
10. Best Practices
-
Always check shapes: Get into the habit of printing
X.shapeandy.shapebefore you build your neural network. The input layer of your network *must* match the number of columns inX.
11. Exercises
- 1. Create a NumPy array containing the numbers 1 through 5. Square every number in the array using a single mathematical operation (Vectorization).
-
2.
If you load a Pandas DataFrame and
df.shapereturns(1000, 15), what does that mean in terms of rows and columns?
12. MCQ Quiz with Answers
Question 1
Why are NumPy arrays preferred over standard Python lists for Deep Learning?
Question 2
In Pandas, what method is used to drop rows that contain missing (NaN) values?
13. Interview Questions
- Q: Explain the difference in purpose between NumPy and Pandas in a Data Science workflow.
-
Q: What does the
.shapeattribute of a NumPy array tell you, and why is it critical when designing a Neural Network?