Skip to main content
Pandas & NumPy
CHAPTER 20 Beginner

Advanced NumPy Concepts

Updated: May 18, 2026
5 min read

# CHAPTER 20

Advanced NumPy Concepts

1. Chapter Introduction

Beyond basic arrays, NumPy offers structured arrays for mixed-type data, memory-mapped arrays for datasets larger than RAM, advanced indexing tricks, and performance tools that underpin production data science workflows.

2. Advanced Indexing

python
12345678910111213141516171819202122
import numpy as np

# ix_ — open mesh for cross-indexing
matrix = np.arange(1, 26).reshape(5, 5)
rows = np.array([0, 2, 4])
cols = np.array([1, 3])
print(matrix[np.ix_(rows, cols)])    # 3x2 submatrix

# Boolean indexing with 2D mask
mask = matrix > 15
print(matrix[mask])    # 1D array of elements > 15
matrix[mask] = 0       # Set all > 15 to 0

# where — complex conditional selection
data = np.array([10, -5, 8, -3, 12, -1, 7])
result = np.where(data > 0, data, data * -1)   # abs() equivalent
print(result)    # [10 5 8 3 12 1 7]

# select — multiple conditions
conditions = [data < 0, data == 0, data > 0]
choices = [&#039;negative', 'zero', 'positive']
print(np.select(conditions, choices))

3. Structured Arrays

python
1234567891011121314151617181920
# Structured array — like a mini database table
employee_dtype = np.dtype([
    (&#039;name', 'U20'),       # Unicode string max 20 chars
    (&#039;age', 'i4'),         # 32-bit integer
    (&#039;salary', 'f8'),      # 64-bit float
    (&#039;active', 'bool')     # Boolean
])

employees = np.array([
    (&#039;Alice', 28, 85000.0, True),
    (&#039;Bob',   35, 72000.0, False),
    (&#039;Carol', 31, 91000.0, True),
    (&#039;David', 42, 55000.0, True)
], dtype=employee_dtype)

print(employees)
print(employees[&#039;name'])       # All names
print(employees[&#039;salary'])     # All salaries
print(employees[employees[&#039;active']])    # Active employees
print(employees[employees[&#039;salary'] > 75000]['name'])  # High earners

4. Memory Layout and Strides

python
12345678910111213141516171819
# C-order (row-major) vs F-order (column-major)
arr_c = np.array([[1,2,3],[4,5,6]], order=&#039;C')  # Row-major (default)
arr_f = np.array([[1,2,3],[4,5,6]], order=&#039;F')  # Column-major (Fortran)

print(arr_c.strides)   # (24, 8) — 24 bytes to next row, 8 to next element
print(arr_f.strides)   # (8, 16)

# Check memory ownership
arr = np.arange(12)
view = arr[::2]          # View — shares memory
copy = arr[::2].copy()   # Copy — independent

print(np.shares_memory(arr, view))   # True
print(np.shares_memory(arr, copy))   # False

# Contiguous check (affects performance)
print(arr_c.flags[&#039;C_CONTIGUOUS'])   # True
print(arr_f.flags[&#039;F_CONTIGUOUS'])   # True
arr_f_contiguous = np.ascontiguousarray(arr_f)   # Convert to C-order

5. Memory-Efficient Techniques

python
12345678910111213141516171819202122
# Use appropriate dtypes to save memory
import sys

arr_float64 = np.random.rand(1_000_000)
arr_float32 = arr_float64.astype(np.float32)
arr_int64   = np.arange(1_000_000)
arr_int16   = arr_int64.astype(np.int16)  # If values fit in -32768 to 32767

print(f"float64: {sys.getsizeof(arr_float64):,} bytes")   # ~8MB
print(f"float32: {sys.getsizeof(arr_float32):,} bytes")   # ~4MB (50% saving)
print(f"int64:   {sys.getsizeof(arr_int64):,} bytes")     # ~8MB
print(f"int16:   {sys.getsizeof(arr_int16):,} bytes")     # ~2MB (75% saving)

# Memory-mapped arrays — work with files larger than RAM
fp = np.memmap(&#039;large_data.npy', dtype='float32', mode='w+', shape=(10000, 1000))
fp[:] = np.random.rand(10000, 1000)
fp.flush()   # Write to disk
del fp

# Reopen and read without loading into RAM
fp_read = np.memmap(&#039;large_data.npy', dtype='float32', mode='r', shape=(10000, 1000))
print(f"First row mean: {fp_read[0].mean():.4f}")

6. Common Mistakes

  • Using float64 when float32 suffices: For ML applications, float32 uses half the memory with negligible precision loss.
  • Unintended views: arr[::2] returns a view — modifying it modifies the original. Use .copy() when independence is needed.

7. MCQs

Question 1

np.ix([0,2], [1,3]) creates?

Question 2

np.select(conditions, choices) selects?

Question 3

Structured array dtype 'U20' means?

Question 4

Strides tell NumPy?

Question 5

np.memmap is for?

Question 6

float32 vs float64 memory?

Question 7

np.sharesmemory(a, b) returns?

Question 8

C-order array stores data?

Question 9

.flags['CCONTIGUOUS'] True means?

Question 10

Best dtype for age data (0-120)?

8. Interview Questions

  • Q: What is the difference between a NumPy view and a copy?
  • Q: How do you reduce memory usage when working with large NumPy arrays?

9. Summary

Advanced NumPy: np.ix
for cross-indexing, structured arrays for mixed-type tabular data, strides for memory layout understanding, memmap for out-of-core computation, and dtype selection for 50-87% memory savings. These tools scale NumPy from exploration to production.

10. Next Chapter Recommendation

In Chapter 21: Advanced Pandas Operations, we master MultiIndex, window functions, categorical data, and advanced aggregation patterns.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·