Pandas Series and DataFrames
# CHAPTER 14
Pandas Series and DataFrames
1. Chapter Introduction
Loading data is just the beginning. The real work is extracting exactly what you need. How do you select just the "Revenue" column? How do you filter for customers who live in "New York" and spent more than $100? How do you create a new column based on a calculation? This chapter covers the mechanics of slicing, dicing, and filtering Pandas DataFrames.2. Selecting Columns
When you select a single column from a DataFrame, Pandas returns a 1D Series. If you select multiple columns, it returns a smaller 2D DataFrame.
3. Filtering Rows (Boolean Indexing)
Filtering in Pandas works exactly like Boolean Indexing in NumPy. You pass a condition inside the brackets.
4. Row Selection using .loc and .iloc
How do you get the 3rd row of the table? Pandas provides two special indexers.
-
.iloc(Integer Location): Selects by absolute numerical position (0, 1, 2...).
-
.loc(Label Location): Selects by the index *name* (useful if your rows are named 'RowA', 'RowB' instead of 0, 1).
5. Creating and Modifying Columns
Creating a new column is as easy as declaring a new dictionary key. Because Pandas is built on NumPy, calculations are vectorized (no loops needed!).
6. Sorting Data
You can quickly sort the DataFrame by any column.
7. Mini Project: Sales Filter
8. Common Mistakes
-
Forgetting Double Brackets:
df['Name', 'Age']will crash. To pass a list of multiple columns, you must use double brackets:df[['Name', 'Age']].
-
Using
andinstead of&: In standard Python, you use the wordand. In Pandas filters, you MUST use the bitwise operator&(and|foror), and you MUST wrap conditions in parentheses:(df['A'] > 1) & (df['B'] < 5).
9. MCQs
What is the syntax to select a single column named "Salary"?
When you select a single column df['Salary'], what data structure is returned?
What is the correct syntax to select two columns, "Name" and "Age"?
Which Pandas indexer selects rows based on their absolute integer position (0, 1, 2)?
How do you filter a DataFrame to show only rows where Age is greater than 30?
When combining multiple filter conditions, which operator must you use for "AND"?
When combining multiple filter conditions, what syntax rule is STRICTLY required?
How do you create a new column called "Tax" that is 10% of the "Price" column?
Which method sorts a DataFrame based on a specific column?
How do you sort a column in descending order (highest to lowest)?
10. Interview Questions
-
Q: Explain the difference between
.locand.iloc.
-
Q: I have a DataFrame
df. Why doesdf['Age'] > 30return a Series of True/False values, whiledf[df['Age'] > 30]returns a DataFrame?
11. Summary
Manipulating DataFrames is all about syntax. Select single columns usingdf['Col'], and multiple columns using a list df[['Col1', 'Col2']]. Filter rows by passing a boolean condition back into the DataFrame: df[df['Col'] == Value]. Use .iloc[] to grab rows by numerical index, and .sort_values() to organize your data. Vectorization allows you to create new calculated columns instantly.