CHAPTER 09 Beginner

Introduction to Pandas

Updated: May 18, 2026

5 min read

# CHAPTER 9

Introduction to Pandas

1. Chapter Introduction

Pandas (Panel Data + Python) is the most popular data analysis library in the world. It transforms raw data files into structured, queryable, manipulable DataFrames — making data analysis as intuitive as working with SQL or Excel, but with Python's full power.

2. What is Pandas?

python

12345678910111213141516171819

import pandas as pd
import numpy as np

# Pandas provides two core data structures:
# 1. Series  — 1D labeled array
# 2. DataFrame — 2D labeled table (columns of different types)

# Key Pandas capabilities:
print("""
Pandas can:
✅ Read CSV, Excel, JSON, SQL, Parquet, HDF5
✅ Clean messy, incomplete data
✅ Filter, select, transform data
✅ Group and aggregate (like SQL GROUP BY)
✅ Merge datasets (like SQL JOIN)
✅ Handle dates and time series
✅ Visualize data
✅ Export to multiple formats
""")

3. Pandas Series

python

1234567891011121314151617181920

# Series: 1D array with a labeled index
s = pd.Series([10, 20, 30, 40, 50])
print(s)
# 0    10
# 1    20
# dtype: int64

# Custom index
s_named = pd.Series(
    [85, 92, 78, 96],
    index=[&#039;Math', 'Science', 'English', 'History'],
    name=&#039;Alice_Scores'
)
print(s_named)
print(f"\nMath score: {s_named[&#039;Math']}")      # 85
print(f"Mean score: {s_named.mean():.1f}")    # 87.75

# Series from dict
city_pop = pd.Series({&#039;NYC': 8.3, 'LA': 3.9, 'Chicago': 2.7, 'Houston': 2.3})
print(city_pop.sort_values(ascending=False))

4. Pandas DataFrame

python

12345678910111213141516171819

# DataFrame: 2D table — the primary Pandas structure
data = {
    &#039;Employee': ['Alice', 'Bob', 'Carol', 'David', 'Eve'],
    &#039;Department': ['Engineering', 'Marketing', 'Engineering', 'Sales', 'HR'],
    &#039;Salary': [85000, 62000, 91000, 55000, 58000],
    &#039;Experience': [5, 3, 8, 2, 4],
    &#039;Rating': [4.5, 3.8, 4.9, 3.5, 4.1]
}

df = pd.DataFrame(data)
print(df)
print("\nShape:", df.shape)          # (5, 5)
print("Columns:", list(df.columns))
print("Index:", list(df.index))

# DataFrame attributes
print(df.dtypes)          # Data types per column
print(df.info())          # Overview: shape, dtypes, memory
print(df.describe())      # Statistical summary of numeric cols

5. Accessing Data

python

1234567891011121314

# Column access
print(df[&#039;Salary'])                    # Single column → Series
print(df[[&#039;Employee', 'Salary']])      # Multiple columns → DataFrame

# Row access
print(df.head(3))                      # First 3 rows
print(df.tail(2))                      # Last 2 rows
print(df.iloc[0])                      # First row by position
print(df.loc[2])                       # Row with index label 2

# Quick statistics
print(df[&#039;Salary'].mean())             # 70200
print(df[&#039;Salary'].max())              # 91000
print(df[&#039;Department'].value_counts()) # Counts per department

6. Mini Project: Employee Dataset Analysis

python

1234567891011121314151617181920212223242526272829303132333435363738394041

import pandas as pd

employees = pd.DataFrame({
    &#039;Name': ['Alice','Bob','Carol','David','Eve','Frank','Grace','Henry'],
    &#039;Dept': ['Eng','Mkt','Eng','Sales','HR','Eng','Mkt','Sales'],
    &#039;Salary': [85000, 62000, 91000, 55000, 58000, 78000, 67000, 49000],
    &#039;Experience': [5, 3, 8, 2, 4, 6, 4, 1],
    &#039;Rating': [4.5, 3.8, 4.9, 3.5, 4.1, 4.3, 4.0, 3.2]
})

print("=" * 50)
print("EMPLOYEE ANALYSIS REPORT")
print("=" * 50)

# Basic info
print(f"\nTotal employees: {len(employees)}")
print(f"Departments: {employees[&#039;Dept'].unique()}")

# Salary analysis
print(f"\nSalary Statistics:")
print(f"  Average: ${employees[&#039;Salary'].mean():,.0f}")
print(f"  Highest: ${employees[&#039;Salary'].max():,} ({employees.loc[employees['Salary'].idxmax(), 'Name']})")
print(f"  Lowest:  ${employees[&#039;Salary'].min():,} ({employees.loc[employees['Salary'].idxmin(), 'Name']})")

# Department summary
print("\nDepartment Summary:")
dept_summary = employees.groupby(&#039;Dept').agg(
    Count=(&#039;Name', 'count'),
    Avg_Salary=(&#039;Salary', 'mean'),
    Avg_Rating=(&#039;Rating', 'mean')
).round(2)
print(dept_summary)

# Top performers
print("\nTop Performers (Rating ≥ 4.3):")
top = employees[employees[&#039;Rating'] >= 4.3][['Name', 'Dept', 'Salary', 'Rating']]
print(top.to_string(index=False))

# Salary vs experience correlation
correlation = employees[&#039;Salary'].corr(employees['Experience'])
print(f"\nSalary-Experience Correlation: {correlation:.3f}")

7. Common Mistakes

df['col'] vs df[['col']]: Single brackets return a Series. Double brackets return a DataFrame. Use [[]] when you need a DataFrame.

Modifying a copy: newdf = df[df['Salary'] > 60000] then newdf['Salary'] = 0 may warn "SettingWithCopyWarning". Use .copy() explicitly.

8. MCQs

Question 1

Pandas primary 2D data structure?

Question 2

`df.head(5)` shows?

Question 3

`df['col']` returns?

Question 4

`df[['col1','col2']]` returns?

Question 5

`df.describe()` shows?

Question 6

`df.shape` returns?

Question 7

`df['col'].valuecounts()` shows?

Question 8

df.dtypes shows?

Question 9

df.info() shows?

Question 10

Pandas is built on?

9. Interview Questions

Q: What is the difference between a Pandas Series and a DataFrame?

Q: How do you get basic statistics of a DataFrame quickly?

10. Summary
Pandas' two structures — Series (1D labeled array) and DataFrame (2D labeled table) — are the workhorses of Python data science. describe(), info(), valuecounts(), and groupby() provide instant insights into any dataset. Pandas bridges the gap between raw data files and analyzed insights.

11. Next Chapter Recommendation

In Chapter 10: Pandas Series and DataFrames Deep Dive, we master every aspect of creating, accessing, and manipulating these structures.

Browse All 21+ Subject Areas

Quick Links

Visual Algorithm Labs

Frontend Dev

The Future of Web Architecture in 2026

Practice Quizzes

Introduction to Pandas #

1. Chapter Introduction #

2. What is Pandas? #

3. Pandas Series #

4. Pandas DataFrame #

5. Accessing Data #

6. Mini Project: Employee Dataset Analysis #

7. Common Mistakes #

8. MCQs #

Pandas primary 2D data structure?

df.head(5) shows?

df['col'] returns?

df[['col1','col2']] returns?

df.describe() shows?

df.shape returns?

df['col'].valuecounts() shows?

df.dtypes shows?

df.info() shows?

Pandas is built on?

9. Interview Questions #

10. Summary #

11. Next Chapter Recommendation #

Finish this Chapter

Discussion

Explore More

📖 Related Tutorials 5

Send Feedback / Bug

Feedback Submitted!

Introduction to Pandas

1. Chapter Introduction

2. What is Pandas?

3. Pandas Series

4. Pandas DataFrame

5. Accessing Data

6. Mini Project: Employee Dataset Analysis

7. Common Mistakes

8. MCQs

`df.head(5)` shows?

`df['col']` returns?

`df[['col1','col2']]` returns?

`df.describe()` shows?

`df.shape` returns?

`df['col'].valuecounts()` shows?

`df.dtypes` shows?

`df.info()` shows?

9. Interview Questions

10. Summary

11. Next Chapter Recommendation