NumPy Arrays and Operations
# CHAPTER 11
NumPy Arrays and Operations
1. Chapter Introduction
Creating arrays is only the first step. In data science, you must extract specific slices of data from massive matrices, reshape them to fit machine learning algorithms, and calculate statistical summaries. This chapter covers how to navigate 1D and 2D arrays, slice them, and apply mathematical operations across specific rows and columns.2. Array Indexing and Slicing (1D)
Indexing a 1D NumPy array works exactly like a standard Python list. It is zero-indexed, and slicing uses the [start:stop:step] syntax.
3. Matrix Indexing and Slicing (2D)
Machine learning heavily utilizes 2D arrays (Rows and Columns). The syntax is matrix[rowindex, columnindex].
4. Boolean Indexing (Filtering)
You can filter arrays using conditions. This is one of the most powerful features in data science.
5. Reshaping Arrays
Machine learning models often require data in a very specific shape (e.g., a 1D list must be converted into a 2D column). We use .reshape().
6. Mathematical and Aggregation Functions
NumPy replaces the need for loops by providing highly optimized statistical functions.
7. Mini Project: Matrix Calculator
Let's say we have an array of daily sales (Row 1 is Product A, Row 2 is Product B) over 3 days.
8. Common Mistakes
-
Confusing
axis=0andaxis=1: Remember:axis=0crushes the rows (calculating down the columns).axis=1crushes the columns (calculating across the rows).
-
Reshaping to impossible dimensions: If you have an array of 5 items, you cannot
.reshape(2, 3)because 2*3=6. The total number of elements must remain identical.
9. MCQs
How do you access the item in the 1st row and 2nd column of a 2D array named matrix?
What does matrix[:, 0] extract?
If arr = np.array([1, 2, 3, 4]), what is arr[arr > 2]?
If you have a 1D array of 12 items, which of these is a valid .reshape() argument?
What does the -1 do in arr.reshape(-1, 1)?
Which function calculates the average of an array?
When summing a 2D matrix, what does axis=0 do?
Which function returns the *index* of the maximum value in an array?
What is Boolean Indexing?
arr.shape = (2, 2)? a) Yes, but using .reshape() is much more common and safer b) No, shape is read-only — Answer: a
10. Interview Questions
- Q: Explain how Boolean Indexing works in NumPy. Write a line of code to extract all negative numbers from an array.
-
Q: You have a 2D matrix. Explain the difference between
np.sum(matrix, axis=0)andnp.sum(matrix, axis=1).
11. Summary
Manipulating multi-dimensional arrays is the core of NumPy. Remember the syntax[rows, columns] for 2D slicing. Boolean indexing allows you to filter data instantly without for loops. The .reshape() function is critical for preparing data for Scikit-Learn. Finally, master aggregation functions like np.mean() and np.sum(), and memorize that axis=0 acts vertically (columns) while axis=1 acts horizontally (rows).