Skip to main content
Pandas & NumPy
CHAPTER 09 Beginner

Introduction to Pandas

Updated: May 18, 2026
5 min read

# CHAPTER 9

Introduction to Pandas

1. Chapter Introduction

Pandas (Panel Data + Python) is the most popular data analysis library in the world. It transforms raw data files into structured, queryable, manipulable DataFrames — making data analysis as intuitive as working with SQL or Excel, but with Python's full power.

2. What is Pandas?

python
12345678910111213141516171819
import pandas as pd
import numpy as np

# Pandas provides two core data structures:
# 1. Series  — 1D labeled array
# 2. DataFrame — 2D labeled table (columns of different types)

# Key Pandas capabilities:
print("""
Pandas can:
✅ Read CSV, Excel, JSON, SQL, Parquet, HDF5
✅ Clean messy, incomplete data
✅ Filter, select, transform data
✅ Group and aggregate (like SQL GROUP BY)
✅ Merge datasets (like SQL JOIN)
✅ Handle dates and time series
✅ Visualize data
✅ Export to multiple formats
""")

3. Pandas Series

python
1234567891011121314151617181920
# Series: 1D array with a labeled index
s = pd.Series([10, 20, 30, 40, 50])
print(s)
# 0    10
# 1    20
# dtype: int64

# Custom index
s_named = pd.Series(
    [85, 92, 78, 96],
    index=['Math', 'Science', 'English', 'History'],
    name='Alice_Scores'
)
print(s_named)
print(f"\nMath score: {s_named['Math']}")      # 85
print(f"Mean score: {s_named.mean():.1f}")    # 87.75

# Series from dict
city_pop = pd.Series({'NYC': 8.3, 'LA': 3.9, 'Chicago': 2.7, 'Houston': 2.3})
print(city_pop.sort_values(ascending=False))

4. Pandas DataFrame

python
12345678910111213141516171819
# DataFrame: 2D table — the primary Pandas structure
data = {
    'Employee': ['Alice', 'Bob', 'Carol', 'David', 'Eve'],
    'Department': ['Engineering', 'Marketing', 'Engineering', 'Sales', 'HR'],
    'Salary': [85000, 62000, 91000, 55000, 58000],
    'Experience': [5, 3, 8, 2, 4],
    'Rating': [4.5, 3.8, 4.9, 3.5, 4.1]
}

df = pd.DataFrame(data)
print(df)
print("\nShape:", df.shape)          # (5, 5)
print("Columns:", list(df.columns))
print("Index:", list(df.index))

# DataFrame attributes
print(df.dtypes)          # Data types per column
print(df.info())          # Overview: shape, dtypes, memory
print(df.describe())      # Statistical summary of numeric cols

5. Accessing Data

python
1234567891011121314
# Column access
print(df['Salary'])                    # Single column → Series
print(df[['Employee', 'Salary']])      # Multiple columns → DataFrame

# Row access
print(df.head(3))                      # First 3 rows
print(df.tail(2))                      # Last 2 rows
print(df.iloc[0])                      # First row by position
print(df.loc[2])                       # Row with index label 2

# Quick statistics
print(df['Salary'].mean())             # 70200
print(df['Salary'].max())              # 91000
print(df['Department'].value_counts()) # Counts per department

6. Mini Project: Employee Dataset Analysis

python
1234567891011121314151617181920212223242526272829303132333435363738394041
import pandas as pd

employees = pd.DataFrame({
    'Name': ['Alice','Bob','Carol','David','Eve','Frank','Grace','Henry'],
    'Dept': ['Eng','Mkt','Eng','Sales','HR','Eng','Mkt','Sales'],
    'Salary': [85000, 62000, 91000, 55000, 58000, 78000, 67000, 49000],
    'Experience': [5, 3, 8, 2, 4, 6, 4, 1],
    'Rating': [4.5, 3.8, 4.9, 3.5, 4.1, 4.3, 4.0, 3.2]
})

print("=" * 50)
print("EMPLOYEE ANALYSIS REPORT")
print("=" * 50)

# Basic info
print(f"\nTotal employees: {len(employees)}")
print(f"Departments: {employees['Dept'].unique()}")

# Salary analysis
print(f"\nSalary Statistics:")
print(f"  Average: ${employees['Salary'].mean():,.0f}")
print(f"  Highest: ${employees['Salary'].max():,} ({employees.loc[employees['Salary'].idxmax(), 'Name']})")
print(f"  Lowest:  ${employees['Salary'].min():,} ({employees.loc[employees['Salary'].idxmin(), 'Name']})")

# Department summary
print("\nDepartment Summary:")
dept_summary = employees.groupby('Dept').agg(
    Count=('Name', 'count'),
    Avg_Salary=('Salary', 'mean'),
    Avg_Rating=('Rating', 'mean')
).round(2)
print(dept_summary)

# Top performers
print("\nTop Performers (Rating ≥ 4.3):")
top = employees[employees['Rating'] >= 4.3][['Name', 'Dept', 'Salary', 'Rating']]
print(top.to_string(index=False))

# Salary vs experience correlation
correlation = employees['Salary'].corr(employees['Experience'])
print(f"\nSalary-Experience Correlation: {correlation:.3f}")

7. Common Mistakes

  • df['col'] vs df[['col']]: Single brackets return a Series. Double brackets return a DataFrame. Use [[]] when you need a DataFrame.
  • Modifying a copy: newdf = df[df['Salary'] > 60000] then newdf['Salary'] = 0 may warn "SettingWithCopyWarning". Use .copy() explicitly.

8. MCQs

Question 1

Pandas primary 2D data structure?

Question 2

df.head(5) shows?

Question 3

df['col'] returns?

Question 4

df[['col1','col2']] returns?

Question 5

df.describe() shows?

Question 6

df.shape returns?

Question 7

df['col'].valuecounts() shows?

Question 8

df.dtypes shows?

Question 9

df.info() shows?

Question 10

Pandas is built on?

9. Interview Questions

  • Q: What is the difference between a Pandas Series and a DataFrame?
  • Q: How do you get basic statistics of a DataFrame quickly?

10. Summary

Pandas' two structures — Series (1D labeled array) and DataFrame (2D labeled table) — are the workhorses of Python data science. describe(), info(), value
counts(), and groupby() provide instant insights into any dataset. Pandas bridges the gap between raw data files and analyzed insights.

11. Next Chapter Recommendation

In Chapter 10: Pandas Series and DataFrames Deep Dive, we master every aspect of creating, accessing, and manipulating these structures.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·