Skip to main content
Pandas & NumPy
CHAPTER 10 Beginner

Pandas Series and DataFrames

Updated: May 18, 2026
5 min read

# CHAPTER 10

Pandas Series and DataFrames

1. Chapter Introduction

Series and DataFrame are Pandas' two pillars. This chapter explores every creation method, attribute, and access pattern — the foundation for all Pandas operations.

2. Series — Deep Dive

python
123456789101112131415161718192021222324
import pandas as pd
import numpy as np

# Create from list
s1 = pd.Series([10, 20, 30, 40], name='Values')

# Create from dict (dict keys → index)
s2 = pd.Series({'Mon': 120, 'Tue': 145, 'Wed': 98, 'Thu': 167, 'Fri': 134}, name='Daily_Sales')

# Create from NumPy array
s3 = pd.Series(np.linspace(0, 1, 5), name='Probability')

# Create from scalar (repeats for each index)
s4 = pd.Series(100, index=['A', 'B', 'C', 'D'])

print(s2)
print(f"\nIndex:  {list(s2.index)}")
print(f"Values: {s2.values}")   # NumPy array
print(f"dtype:  {s2.dtype}")
print(f"Name:   {s2.name}")

# Arithmetic on Series (aligns by index)
s5 = pd.Series({'Mon': 10, 'Tue': 15, 'Wed': 8, 'Fri': 20})
print(s2 + s5)   # NaN for 'Thu' (missing in s5)

3. DataFrame — Creation Methods

python
123456789101112131415161718192021222324
# Method 1: From dict of lists
df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Carol'],
    'Age': [25, 30, 35],
    'Salary': [55000, 72000, 88000]
})

# Method 2: From list of dicts (common with JSON data)
records = [
    {'Name': 'Alice', 'Age': 25, 'City': 'NYC'},
    {'Name': 'Bob',   'Age': 30, 'City': 'LA'},
    {'Name': 'Carol', 'Age': 35}  # Missing City → NaN
]
df2 = pd.DataFrame(records)

# Method 3: From NumPy array
arr = np.random.randint(60, 100, size=(5, 4))
df3 = pd.DataFrame(arr, columns=['Math', 'Science', 'English', 'History'],
                   index=['Alice', 'Bob', 'Carol', 'David', 'Eve'])

# Method 4: From CSV/Excel (covered in Chapter 11)
# df4 = pd.read_csv('data.csv')

print(df3)

4. DataFrame Attributes

python
1234567891011121314151617
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Carol', 'David'],
    'Dept': ['Eng', 'Mkt', 'Eng', 'HR'],
    'Salary': [85000, 62000, 91000, 58000],
    'Rating': [4.5, 3.8, 4.9, 4.1]
})

print(f"shape:    {df.shape}")       # (4, 4)
print(f"ndim:     {df.ndim}")        # 2
print(f"size:     {df.size}")        # 16 (total elements)
print(f"dtypes:\n{df.dtypes}")       # dtype per column
print(f"columns:  {list(df.columns)}")
print(f"index:    {list(df.index)}")
print(f"values:\n{df.values}")       # NumPy array

# Memory usage
print(f"\nMemory: {df.memory_usage(deep=True).sum()} bytes")

5. Column Operations

python
1234567891011121314151617181920212223242526
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Carol', 'David'],
    'Salary': [85000, 62000, 91000, 58000],
    'Experience': [5, 3, 8, 2]
})

# Add new column
df['Bonus'] = df['Salary'] * 0.10
df['Level'] = pd.cut(df['Experience'],
                     bins=[0, 3, 6, 10],
                     labels=['Junior', 'Mid', 'Senior'])

# Rename columns
df = df.rename(columns={'Name': 'Employee', 'Salary': 'Base_Salary'})

# Drop column
df_no_bonus = df.drop(columns=['Bonus'])

# Reorder columns
df = df[['Employee', 'Level', 'Base_Salary', 'Experience', 'Bonus']]

# Change dtype
df['Experience'] = df['Experience'].astype(float)

print(df)
print(f"\ndtypes:\n{df.dtypes}")

6. Index Operations

python
1234567891011121314151617181920212223
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Carol', 'David'],
    'Dept': ['Eng', 'Mkt', 'Eng', 'HR'],
    'Salary': [85000, 62000, 91000, 58000]
})

# Set a column as index
df_indexed = df.set_index('Name')
print(df_indexed)
print(df_indexed.loc['Alice'])   # Access by name

# Reset index
df_reset = df_indexed.reset_index()

# Custom index
df.index = ['E001', 'E002', 'E003', 'E004']
print(df)

# Sort by index
df_sorted = df.sort_index()

# Check if value in index
print('E002' in df.index)   # True

7. Common Mistakes

  • df.values vs df.tonumpy(): Both return NumPy arrays, but tonumpy() is preferred as it handles dtypes and missing values more explicitly.
  • df.columns is immutable: You cannot do df.columns[0] = 'new' — use df.rename() instead.

8. MCQs

Question 1

pd.Series({'a': 1, 'b': 2}) creates Series with?

Question 2

df.size returns?

Question 3

df.values returns?

Question 4

df.setindex('Name') does?

Question 5

Add column df['Tax'] = df['Salary'] * 0.2 creates?

Question 6

df.rename(columns={'old': 'new'}) does?

Question 7

pd.cut(df['Age'], bins=[0,18,65,100], labels=['child','adult','senior']) creates?

Question 8

df.memoryusage(deep=True) returns?

Question 9

df.reset_index() does?

Question 10

df.ndim for a DataFrame returns?

9. Interview Questions

  • Q: How do you create a DataFrame from a list of dictionaries?
  • Q: What is the difference between df.index and df.columns?

10. Summary

Series = 1D labeled array with index. DataFrame = 2D table with labeled rows and columns. Master creation from dicts, lists, NumPy arrays. Column operations (add, rename, drop, reorder) and index operations (set, reset, sort) are fundamental daily tasks.

11. Next Chapter Recommendation

In Chapter 11: Reading and Writing Data Files, we load real-world data from CSV, Excel, and JSON files.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·