Skip to main content
Python for Data Science
CHAPTER 11 Beginner

NumPy Arrays and Operations

Updated: May 18, 2026
5 min read

# CHAPTER 11

NumPy Arrays and Operations

1. Chapter Introduction

Creating arrays is only the first step. In data science, you must extract specific slices of data from massive matrices, reshape them to fit machine learning algorithms, and calculate statistical summaries. This chapter covers how to navigate 1D and 2D arrays, slice them, and apply mathematical operations across specific rows and columns.

2. Array Indexing and Slicing (1D)

Indexing a 1D NumPy array works exactly like a standard Python list. It is zero-indexed, and slicing uses the [start:stop:step] syntax.

python
1234567891011
import numpy as np

arr = np.array([10, 20, 30, 40, 50])

# Indexing
print(arr[0])   # 10
print(arr[-1])  # 50 (Last item)

# Slicing
print(arr[1:4]) # [20 30 40]
print(arr[:3])  # [10 20 30]

3. Matrix Indexing and Slicing (2D)

Machine learning heavily utilizes 2D arrays (Rows and Columns). The syntax is matrix[rowindex, columnindex].

python
1234567891011121314151617181920
# Create a 3x3 matrix
matrix = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

# 1. Accessing a single element (Row 0, Col 2)
print(matrix[0, 2]) # 3

# 2. Accessing a whole row (Row 1, All Columns)
print(matrix[1, :]) # [4 5 6]

# 3. Accessing a whole column (All Rows, Col 2)
print(matrix[:, 2]) # [3 6 9]

# 4. Slicing a sub-matrix (Rows 0-1, Cols 1-2)
print(matrix[0:2, 1:3])
# [[2 3]
#  [5 6]]

4. Boolean Indexing (Filtering)

You can filter arrays using conditions. This is one of the most powerful features in data science.

python
123456789101112
ages = np.array([15, 22, 17, 30, 45, 12])

# Creates a boolean array: [False, True, False, True, True, False]
is_adult = ages >= 18

# Pass the boolean array back into the brackets to filter!
adults = ages[is_adult]

print(adults) # [22 30 45]

# Shorthand (Very common)
print(ages[ages >= 18]) 

5. Reshaping Arrays

Machine learning models often require data in a very specific shape (e.g., a 1D list must be converted into a 2D column). We use .reshape().

python
12345678910
# A 1D array of 6 items
arr_1d = np.array([1, 2, 3, 4, 5, 6])

# Reshape into 2 Rows, 3 Columns
arr_2d = arr_1d.reshape(2, 3)
print(arr_2d)

# Reshape into 6 Rows, 1 Column (-1 tells NumPy to calculate the rows automatically)
column_vector = arr_1d.reshape(-1, 1)
print(column_vector)

6. Mathematical and Aggregation Functions

NumPy replaces the need for loops by providing highly optimized statistical functions.

python
12345678910111213141516
data = np.array([
    [10, 20],
    [30, 40]
])

# Aggregate the ENTIRE matrix
print("Total Sum:", np.sum(data)) # 100
print("Mean:", np.mean(data))     # 25.0
print("Max:", np.max(data))       # 40

# Aggregate by Axis
# axis=0 means operate down the COLUMNS
print("Sum of Columns:", np.sum(data, axis=0)) # [40 60]

# axis=1 means operate across the ROWS
print("Sum of Rows:", np.sum(data, axis=1))    # [30 70]

7. Mini Project: Matrix Calculator

Let's say we have an array of daily sales (Row 1 is Product A, Row 2 is Product B) over 3 days.

python
12345678910111213141516
sales = np.array([
    [100, 150, 120], # Product A sales
    [200, 180, 210]  # Product B sales
])

# 1. Total revenue overall
print(f"Total Revenue: ${np.sum(sales)}")

# 2. Total revenue per product (sum across the rows)
prod_totals = np.sum(sales, axis=1)
print(f"Prod A Total: ${prod_totals[0]} | Prod B Total: ${prod_totals[1]}")

# 3. Best day overall (sum down the columns, then find max)
daily_totals = np.sum(sales, axis=0)
best_day_idx = np.argmax(daily_totals)
print(f"Best Day was Day {best_day_idx + 1} with ${daily_totals[best_day_idx]}")

8. Common Mistakes

  • Confusing axis=0 and axis=1: Remember: axis=0 crushes the rows (calculating down the columns). axis=1 crushes the columns (calculating across the rows).
  • Reshaping to impossible dimensions: If you have an array of 5 items, you cannot .reshape(2, 3) because 2*3=6. The total number of elements must remain identical.

9. MCQs

Question 1

How do you access the item in the 1st row and 2nd column of a 2D array named matrix?

Question 2

What does matrix[:, 0] extract?

Question 3

If arr = np.array([1, 2, 3, 4]), what is arr[arr > 2]?

Question 4

If you have a 1D array of 12 items, which of these is a valid .reshape() argument?

Question 5

What does the -1 do in arr.reshape(-1, 1)?

Question 6

Which function calculates the average of an array?

Question 7

When summing a 2D matrix, what does axis=0 do?

Question 8

Which function returns the *index* of the maximum value in an array?

Question 9

What is Boolean Indexing?

Q10. Can you change the shape of an array in place using arr.shape = (2, 2)? a) Yes, but using .reshape() is much more common and safer b) No, shape is read-only — Answer: a

10. Interview Questions

  • Q: Explain how Boolean Indexing works in NumPy. Write a line of code to extract all negative numbers from an array.
  • Q: You have a 2D matrix. Explain the difference between np.sum(matrix, axis=0) and np.sum(matrix, axis=1).

11. Summary

Manipulating multi-dimensional arrays is the core of NumPy. Remember the syntax [rows, columns] for 2D slicing. Boolean indexing allows you to filter data instantly without for loops. The .reshape() function is critical for preparing data for Scikit-Learn. Finally, master aggregation functions like np.mean() and np.sum(), and memorize that axis=0 acts vertically (columns) while axis=1 acts horizontally (rows).

12. Next Chapter Recommendation

In Chapter 12: NumPy Broadcasting and Vectorization, we will explore the mathematical magic that makes NumPy so fast, learning how it applies arithmetic across arrays of different shapes without using loops.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·