Skip to main content
TensorFlow Introduction
CHAPTER 04 Intermediate

NumPy, Pandas, and Data Handling

Updated: May 16, 2026
5 min read

# CHAPTER 4

NumPy, Pandas, and Data Handling

1. Introduction

A neural network is essentially a giant mathematical blender. It takes numbers in, multiplies them by other numbers, and spits a number out. It cannot read Excel files, and it cannot process standard Python lists efficiently. To prepare data for TensorFlow, we must use specialized scientific libraries. NumPy provides blazing-fast multidimensional arrays for matrix math, while Pandas acts as a programmable spreadsheet to clean and organize real-world data. In this chapter, we will master data handling.

2. Learning Objectives

By the end of this chapter, you will be able to:
  • Create and manipulate NumPy ndarrays.
  • Understand the shape and dimensions of data matrices.
  • Load datasets (like CSVs) using Pandas DataFrames.
  • Preprocess and filter data.
  • Transition data from Pandas to TensorFlow.

3. NumPy Basics and NDArrays

NumPy (Numerical Python) is written in C. It introduces the ndarray (N-Dimensional Array), which is magnitudes faster than a Python list.
python
1234567891011121314
import numpy as np

# 1D Array (Vector) - e.g., a single row of data
vector = np.array([1, 2, 3, 4])

# 2D Array (Matrix) - e.g., a grayscale image or a spreadsheet
matrix = np.array([
    [10, 20],
    [30, 40],
    [50, 60]
])

print(f"Matrix Shape: {matrix.shape}") 
# Output: (3, 2) -> 3 rows, 2 columns

*Understanding .shape is critical. 90% of the errors beginners face in TensorFlow are "Shape Mismatch" errors!*

4. Vectorized Operations

In standard Python, if you want to multiply every item in a list by 5, you have to write a for loop. In NumPy, you use Vectorization, which applies the math instantly to the entire array.
python
12345
prices = np.array([10, 20, 30])
# Instantly multiply all elements by 2
new_prices = prices * 2

print(new_prices) # Output: [20 40 60]

5. Pandas Basics and DataFrames

While NumPy is great for pure math, it doesn't handle column names or missing text values well. Pandas is the ultimate data wrangling tool. Its core object is the DataFrame (a 2D table).
python
1234567891011
import pandas as pd

# Creating a DataFrame manually
data = {
    "Age": [25, 30, 35],
    "Salary": [50000, 65000, 80000],
    "Purchased": [0, 1, 1]
}

df = pd.DataFrame(data)
print(df)

6. Reading Datasets

In reality, you will load data from external files, usually CSVs.
python
12345678
# Load a CSV file
df = pd.read_csv("customer_data.csv")

# View the first 5 rows
print(df.head())

# View a statistical summary (mean, min, max)
print(df.describe())

7. Data Preprocessing with Pandas

Neural networks hate missing data (NaN values). We must use Pandas to clean the data before feeding it to TensorFlow.
python
123456789
# Check for missing values
print(df.isnull().sum())

# Drop any row that is missing data
df_clean = df.dropna()

# Alternatively, fill missing ages with the average age
mean_age = df['Age'].mean()
df['Age'].fillna(mean_age, inplace=True)

8. Mini Project: Analyze Dataset with Pandas

Let's extract exactly what a neural network needs: The Features (Inputs/X) and the Label (Output/y).
python
123456789101112131415161718192021
import pandas as pd

# Mock Data
df = pd.DataFrame({
    "Age": [22, 25, 47, 52, 46],
    "Credit_Score": [600, 650, 800, 750, 710],
    "Approved_Loan": [0, 0, 1, 1, 1]
})

# 1. Isolate the Features (X)
# Drop the target column to keep only the inputs
X = df.drop("Approved_Loan", axis=1)

# 2. Isolate the Target Label (y)
y = df["Approved_Loan"]

# 3. Convert Pandas to NumPy array (TensorFlow prefers NumPy arrays)
X_array = X.to_numpy()
y_array = y.to_numpy()

print("Features Shape:", X_array.shape) # Output: (5, 2)

9. Common Mistakes

  • Confusing Pandas indexing: Trying to select rows using df[0]. In Pandas, you must use df.iloc[0] for integer-location based indexing.
  • Feeding Pandas DataFrames directly into low-level TensorFlow: While Keras (high-level TensorFlow) can sometimes accept Pandas DataFrames directly, it is always safer and more efficient to convert your data to NumPy arrays (.tonumpy()) or TensorFlow Tensors first.

10. Best Practices

  • Always check shapes: Get into the habit of printing X.shape and y.shape before you build your neural network. The input layer of your network *must* match the number of columns in X.

11. Exercises

  1. 1. Create a NumPy array containing the numbers 1 through 5. Square every number in the array using a single mathematical operation.
  1. 2. If you load a Pandas DataFrame and df.shape returns (1000, 15), what does that mean?

12. MCQ Quiz with Answers

Question 1

Why are NumPy arrays preferred over standard Python lists for Deep Learning?

Question 2

In Pandas, what method is used to remove rows that contain missing (NaN) values?

13. Interview Questions

  • Q: Explain the difference in purpose between NumPy and Pandas in a Data Science workflow.
  • Q: What does the .shape attribute of a NumPy array tell you, and why is it critical when designing a Neural Network?

14. FAQs

Q: Does TensorFlow have its own data structures? A: Yes! TensorFlow uses Tensors. Tensors are almost identical to NumPy arrays, but they have special properties that allow them to run on GPUs and calculate gradients for deep learning. (We cover Tensors in Chapter 6).

15. Summary

NumPy and Pandas are the unsung heroes of Artificial Intelligence. While TensorFlow gets all the glory for building the "brain", it is NumPy that provides the fast mathematical infrastructure, and Pandas that organizes the chaotic real-world data into clean matrices ready for consumption.

16. Next Chapter Recommendation

We have the tools. We have the data. Now, we need to understand the architecture of the brain we are trying to build. In Chapter 5: Understanding Neural Networks, we will dive into the theory of Artificial Neurons, Layers, and Backpropagation.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·