CHAPTER 21 Beginner

Advanced Pandas Operations

Updated: May 18, 2026

5 min read

# CHAPTER 21

Advanced Pandas Operations

1. Chapter Introduction

Pandas' advanced features handle hierarchical data, time series windows, memory-efficient categoricals, and complex aggregation patterns used in production analytics pipelines.

2. MultiIndex — Hierarchical Indexing

python

1234567891011121314151617181920212223242526272829

import pandas as pd
import numpy as np

# Create MultiIndex DataFrame
data = {
    &#039;Revenue': [12000, 8000, 15000, 9500, 14000, 7800, 13200, 8900],
    &#039;Units':   [10, 8, 12, 9, 11, 7, 13, 8]
}
index = pd.MultiIndex.from_tuples([
    (&#039;2024 Q1', 'Laptop'), ('2024 Q1', 'Phone'),
    (&#039;2024 Q2', 'Laptop'), ('2024 Q2', 'Phone'),
    (&#039;2024 Q3', 'Laptop'), ('2024 Q3', 'Phone'),
    (&#039;2024 Q4', 'Laptop'), ('2024 Q4', 'Phone')
], names=[&#039;Quarter', 'Product'])

df = pd.DataFrame(data, index=index)
print(df)

# Access levels
print(df.loc[&#039;2024 Q1'])              # All Q1 data
print(df.loc[(&#039;2024 Q1', 'Laptop')]) # Specific: Q1 Laptop
print(df.loc[&#039;2024 Q1':'2024 Q2'])   # Q1 and Q2 slice

# Stack/Unstack — pivot between index and columns
print(df[&#039;Revenue'].unstack())       # Products as columns
print(df.unstack().stack())          # Round trip

# Reset to flat index
df_flat = df.reset_index()

3. Window Functions

python

123456789101112131415161718192021222324

np.random.seed(42)
df_ts = pd.DataFrame({
    &#039;Date': pd.date_range('2024-01', periods=60, freq='D'),
    &#039;Sales': np.random.randint(500, 2000, 60) + np.linspace(0, 500, 60)
}).set_index(&#039;Date')

# Rolling windows
df_ts[&#039;MA_7']   = df_ts['Sales'].rolling(7).mean()    # 7-day moving average
df_ts[&#039;MA_30']  = df_ts['Sales'].rolling(30).mean()   # 30-day moving average
df_ts[&#039;STD_7']  = df_ts['Sales'].rolling(7).std()     # 7-day volatility
df_ts[&#039;MAX_7']  = df_ts['Sales'].rolling(7).max()     # 7-day rolling max

# Bollinger Bands (financial analysis)
df_ts[&#039;Upper_Band'] = df_ts['MA_7'] + 2 * df_ts['STD_7']
df_ts[&#039;Lower_Band'] = df_ts['MA_7'] - 2 * df_ts['STD_7']

# Expanding windows (cumulative from start)
df_ts[&#039;Cum_Mean'] = df_ts['Sales'].expanding().mean()
df_ts[&#039;Cum_Max']  = df_ts['Sales'].expanding().max()

# EWM — Exponentially Weighted Moving Average (recent data weighted more)
df_ts[&#039;EWM_7'] = df_ts['Sales'].ewm(span=7, adjust=False).mean()

print(df_ts[[&#039;Sales', 'MA_7', 'MA_30', 'EWM_7']].head(15))

4. Categorical Data

python

12345678910111213141516171819202122232425

# Categorical dtype — significant memory and speed benefits for repeated strings
df = pd.DataFrame({
    &#039;Dept': ['Engineering', 'Marketing', 'Engineering', 'Sales', 'HR',
             &#039;Engineering', 'Marketing', 'Sales', 'HR', 'Engineering'] * 1000,
    &#039;Salary': np.random.randint(50000, 100000, 10000)
})

# Compare memory
print(f"String:   {df[&#039;Dept'].memory_usage(deep=True):,} bytes")
df[&#039;Dept'] = df['Dept'].astype('category')
print(f"Category: {df[&#039;Dept'].memory_usage(deep=True):,} bytes")  # Much smaller!

# Ordered categories (for comparison)
satisfaction = pd.Categorical(
    [&#039;High', 'Low', 'Medium', 'High', 'Medium'],
    categories=[&#039;Low', 'Medium', 'High'],
    ordered=True
)
s = pd.Series(satisfaction)
print(s > &#039;Low')   # [True, False, True, True, True]
print(s.max())     # 'High'

# Use with groupby — faster than string groupby
summary = df.groupby(&#039;Dept', observed=True)['Salary'].mean()
print(summary)

5. crosstab and cut/qcut

python

12345678910111213141516171819202122232425262728

df_survey = pd.DataFrame({
    &#039;Gender': np.random.choice(['Male', 'Female'], 200),
    &#039;Dept': np.random.choice(['Eng', 'Mkt', 'Sales'], 200),
    &#039;Salary': np.random.normal(70000, 20000, 200).astype(int),
    &#039;Rating': np.random.choice(['A', 'B', 'C'], 200)
})

# Cross tabulation
ct = pd.crosstab(df_survey[&#039;Dept'], df_survey['Gender'])
print("Count by Dept and Gender:")
print(ct)

ct_pct = pd.crosstab(df_survey[&#039;Dept'], df_survey['Gender'], normalize='index') * 100
print("\nPercentage:")
print(ct_pct.round(1))

# cut — bin continuous data
df_survey[&#039;Salary_Band'] = pd.cut(
    df_survey[&#039;Salary'],
    bins=[0, 50000, 70000, 90000, float(&#039;inf')],
    labels=[&#039;Low', 'Mid', 'High', 'Very High']
)

# qcut — equal-frequency bins (quantile-based)
df_survey[&#039;Salary_Quartile'] = pd.qcut(df_survey['Salary'], q=4,
                                        labels=[&#039;Q1', 'Q2', 'Q3', 'Q4'])
print("\nSalary Band distribution:")
print(df_survey[&#039;Salary_Band'].value_counts().sort_index())

6. Common Mistakes

Forgetting observed=True in groupby with categories: Pandas shows all category combinations by default (including zero-count groups). Pass observed=True to show only observed combinations.

Using string dtype for repeated values: Categorical dtype uses 95%+ less memory for columns with few unique values repeated many times.

7. MCQs

Question 1

MultiIndex allows?

Question 2

`df.unstack()` converts?

Question 3

`rolling(7).mean()` uses?

Question 4

Categorical dtype saves memory because?

Question 5

Ordered categories enable?

Question 6

`pd.crosstab()` creates?

Question 7

`pd.cut()` vs `pd.qcut()`?

Question 8

EWM gives?

Question 9

`expanding().mean()` computes?

Question 10

`df.loc['2024 Q1':'2024 Q2']` on MultiIndex selects?

8. Interview Questions

Q: When would you use categorical dtype in Pandas?

Q: What is the difference between rolling() and expanding()?

9. Summary

Advanced Pandas: MultiIndex for hierarchical data, rolling/ewm for time series smoothing, categorical dtype for 90%+ memory savings on repeated strings, crosstab for contingency tables, and cut/qcut for binning continuous variables. These tools separate professional from beginner Pandas users.

10. Next Chapter Recommendation

In Chapter 22: Exploratory Data Analysis (EDA), we systematically investigate datasets — profiling, correlations, outliers, and the Titanic dataset analysis project.

Browse All 21+ Subject Areas

Quick Links

Visual Algorithm Labs

Frontend Dev

The Future of Web Architecture in 2026

Practice Quizzes

Advanced Pandas Operations #

1. Chapter Introduction #

2. MultiIndex — Hierarchical Indexing #

3. Window Functions #

4. Categorical Data #

5. crosstab and cut/qcut #

6. Common Mistakes #

7. MCQs #

MultiIndex allows?

df.unstack() converts?

rolling(7).mean() uses?

Categorical dtype saves memory because?

Ordered categories enable?

pd.crosstab() creates?

pd.cut() vs pd.qcut()?

EWM gives?

expanding().mean() computes?

df.loc['2024 Q1':'2024 Q2'] on MultiIndex selects?

8. Interview Questions #

9. Summary #

10. Next Chapter Recommendation #

Finish this Chapter

Discussion

Explore More

📖 Related Tutorials 5

Send Feedback / Bug

Feedback Submitted!

Advanced Pandas Operations

1. Chapter Introduction

2. MultiIndex — Hierarchical Indexing

3. Window Functions

4. Categorical Data

5. crosstab and cut/qcut

6. Common Mistakes

7. MCQs

`df.unstack()` converts?

`rolling(7).mean()` uses?

`pd.crosstab()` creates?

`pd.cut()` vs `pd.qcut()`?

`expanding().mean()` computes?

`df.loc['2024 Q1':'2024 Q2']` on MultiIndex selects?

8. Interview Questions

9. Summary

10. Next Chapter Recommendation