Skip to main content
Python for Data Science
CHAPTER 14 Beginner

Pandas Series and DataFrames

Updated: May 18, 2026
5 min read

# CHAPTER 14

Pandas Series and DataFrames

1. Chapter Introduction

Loading data is just the beginning. The real work is extracting exactly what you need. How do you select just the "Revenue" column? How do you filter for customers who live in "New York" and spent more than $100? How do you create a new column based on a calculation? This chapter covers the mechanics of slicing, dicing, and filtering Pandas DataFrames.

2. Selecting Columns

When you select a single column from a DataFrame, Pandas returns a 1D Series. If you select multiple columns, it returns a smaller 2D DataFrame.

python
123456789101112131415161718
import pandas as pd

data = {
    "Name": ["Alice", "Bob", "Charlie", "Dave"],
    "Age": [25, 30, 35, 40],
    "City": ["NY", "LA", "NY", "SF"]
}
df = pd.DataFrame(data)

# 1. Select a SINGLE column (Returns a Series)
# Syntax: df['ColumnName']
ages = df['Age']
print(type(ages)) # <class 'pandas.core.series.Series'>

# 2. Select MULTIPLE columns (Returns a DataFrame)
# Syntax: df[['Col1', 'Col2']]  <-- Notice the double brackets!
subset = df[[&#039;Name', 'City']]
print(subset.head(2))

3. Filtering Rows (Boolean Indexing)

Filtering in Pandas works exactly like Boolean Indexing in NumPy. You pass a condition inside the brackets.

python
1234567891011
# 1. Find everyone older than 28
over_28 = df[df[&#039;Age'] > 28]
print(over_28)

# 2. Find everyone who lives in NY
ny_residents = df[df[&#039;City'] == 'NY']

# 3. Complex Filtering (AND = &, OR = |)
# Parentheses are REQUIRED around each condition!
target = df[(df[&#039;Age'] > 28) & (df['City'] == 'NY')]
print("Older than 28 AND in NY:\n", target)

4. Row Selection using .loc and .iloc

How do you get the 3rd row of the table? Pandas provides two special indexers.

  • .iloc (Integer Location): Selects by absolute numerical position (0, 1, 2...).
  • .loc (Label Location): Selects by the index *name* (useful if your rows are named 'RowA', 'RowB' instead of 0, 1).

python
12345
# Select the 1st row (Index 0) using integer location
first_row = df.iloc[0]

# Select rows 0 to 2, and columns 0 to 1
slice_df = df.iloc[0:2, 0:2]

5. Creating and Modifying Columns

Creating a new column is as easy as declaring a new dictionary key. Because Pandas is built on NumPy, calculations are vectorized (no loops needed!).

python
1234567
# Create a new column by applying math to an existing one
df[&#039;Age_In_10_Years'] = df['Age'] + 10

# Create a boolean flag column
df[&#039;Is_Senior'] = df['Age'] >= 40

print(df)

6. Sorting Data

You can quickly sort the DataFrame by any column.

python
123
# Sort by Age descending (oldest first)
sorted_df = df.sort_values(by=&#039;Age', ascending=False)
print(sorted_df)

7. Mini Project: Sales Filter

python
123456789101112131415161718
# Simulating sales data
sales_data = {
    "OrderID": [1, 2, 3, 4],
    "Product": ["Laptop", "Mouse", "Monitor", "Keyboard"],
    "Price": [1200, 25, 300, 50],
    "Quantity": [2, 5, 1, 3]
}
sales = pd.DataFrame(sales_data)

# 1. Create a Total_Revenue column
sales[&#039;Total_Revenue'] = sales['Price'] * sales['Quantity']

# 2. Filter for orders bringing in more than $100
big_orders = sales[sales[&#039;Total_Revenue'] > 100]

# 3. Sort by biggest revenue
final_report = big_orders.sort_values(by=&#039;Total_Revenue', ascending=False)
print(final_report)

8. Common Mistakes

  • Forgetting Double Brackets: df['Name', 'Age'] will crash. To pass a list of multiple columns, you must use double brackets: df[['Name', 'Age']].
  • Using and instead of &: In standard Python, you use the word and. In Pandas filters, you MUST use the bitwise operator & (and | for or), and you MUST wrap conditions in parentheses: (df['A'] > 1) & (df['B'] < 5).

9. MCQs

Question 1

What is the syntax to select a single column named "Salary"?

Question 2

When you select a single column df['Salary'], what data structure is returned?

Question 3

What is the correct syntax to select two columns, "Name" and "Age"?

Question 4

Which Pandas indexer selects rows based on their absolute integer position (0, 1, 2)?

Question 5

How do you filter a DataFrame to show only rows where Age is greater than 30?

Question 6

When combining multiple filter conditions, which operator must you use for "AND"?

Question 7

When combining multiple filter conditions, what syntax rule is STRICTLY required?

Question 8

How do you create a new column called "Tax" that is 10% of the "Price" column?

Question 9

Which method sorts a DataFrame based on a specific column?

Question 10

How do you sort a column in descending order (highest to lowest)?

10. Interview Questions

  • Q: Explain the difference between .loc and .iloc.
  • Q: I have a DataFrame df. Why does df['Age'] > 30 return a Series of True/False values, while df[df['Age'] > 30] returns a DataFrame?

11. Summary

Manipulating DataFrames is all about syntax. Select single columns using df['Col'], and multiple columns using a list df[['Col1', 'Col2']]. Filter rows by passing a boolean condition back into the DataFrame: df[df['Col'] == Value]. Use .iloc[] to grab rows by numerical index, and .sort_values() to organize your data. Vectorization allows you to create new calculated columns instantly.

12. Next Chapter Recommendation

In Chapter 15: Data Cleaning with Pandas, we tackle the most time-consuming part of a Data Scientist's job: dealing with missing data (NaNs), removing duplicates, and fixing incorrect data formats.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·