CHAPTER 17 Beginner

Merging and Joining DataFrames

Updated: May 18, 2026

5 min read

# CHAPTER 17

Merging and Joining DataFrames

1. Chapter Introduction

Real data rarely lives in one table. Customer records, orders, products, and locations each live in separate datasets. Pandas merge, join, and concat replicate all SQL join types, enabling powerful multi-dataset analysis.

2. pd.concat — Stacking DataFrames

python

123456789101112131415

import pandas as pd

# Vertical concat (append rows)
q1 = pd.DataFrame({&#039;Product': ['Laptop','Phone'], 'Revenue': [12000, 4500], 'Quarter': 'Q1'})
q2 = pd.DataFrame({&#039;Product': ['Laptop','Desk'],  'Revenue': [14000, 2100], 'Quarter': 'Q2'})
q3 = pd.DataFrame({&#039;Product': ['Phone','Laptop'], 'Revenue': [5200, 13500], 'Quarter': 'Q3'})

all_quarters = pd.concat([q1, q2, q3], ignore_index=True)
print(all_quarters)

# Horizontal concat (add columns)
df1 = pd.DataFrame({&#039;Name': ['Alice','Bob','Carol']})
df2 = pd.DataFrame({&#039;Salary': [85000,72000,91000], 'Dept': ['Eng','Mkt','Eng']})
combined = pd.concat([df1, df2], axis=1)
print(combined)

3. merge — SQL-Style Joins

python

12345678910111213141516171819202122232425262728293031

employees = pd.DataFrame({
    &#039;EmpID': [1, 2, 3, 4, 5],
    &#039;Name':  ['Alice', 'Bob', 'Carol', 'David', 'Eve'],
    &#039;DeptID': [10, 20, 10, 30, 40]
})

departments = pd.DataFrame({
    &#039;DeptID': [10, 20, 30],
    &#039;DeptName': ['Engineering', 'Marketing', 'Sales'],
    &#039;Location': ['NYC', 'LA', 'Chicago']
})

# INNER JOIN (only matching rows)
inner = pd.merge(employees, departments, on=&#039;DeptID', how='inner')
print("INNER JOIN:")
print(inner)  # 4 rows — Eve's dept (40) has no match, dropped

# LEFT JOIN (all left rows, NaN for unmatched right)
left = pd.merge(employees, departments, on=&#039;DeptID', how='left')
print("\nLEFT JOIN:")
print(left)   # 5 rows — Eve has NaN for DeptName, Location

# RIGHT JOIN (all right rows)
right = pd.merge(employees, departments, on=&#039;DeptID', how='right')
print("\nRIGHT JOIN:")
print(right)

# OUTER JOIN (all rows from both)
outer = pd.merge(employees, departments, on=&#039;DeptID', how='outer')
print("\nOUTER JOIN:")
print(outer)

4. Merge on Different Column Names

python

123456789101112131415161718

orders = pd.DataFrame({
    &#039;OrderID': [101, 102, 103, 104],
    &#039;CustomerID': ['C01', 'C02', 'C01', 'C03'],
    &#039;Amount': [500, 1200, 800, 350]
})

customers = pd.DataFrame({
    &#039;CustID': ['C01', 'C02', 'C04'],
    &#039;Name': ['Alice', 'Bob', 'David'],
    &#039;City': ['NYC', 'LA', 'Chicago']
})

# Different key column names
result = pd.merge(orders, customers,
                  left_on=&#039;CustomerID',
                  right_on=&#039;CustID',
                  how=&#039;left')
print(result)

5. Multi-Table Real-World Example

python

1234567891011121314151617181920212223242526272829303132333435

# E-commerce dataset
products = pd.DataFrame({
    &#039;ProductID': [1, 2, 3, 4],
    &#039;Name': ['Laptop', 'Phone', 'Desk', 'Monitor'],
    &#039;Category': ['Electronics', 'Electronics', 'Furniture', 'Electronics'],
    &#039;Price': [1200, 500, 450, 300]
})

orders = pd.DataFrame({
    &#039;OrderID': [101, 102, 103, 104, 105, 106],
    &#039;CustomerID': [1, 2, 1, 3, 2, 1],
    &#039;ProductID': [1, 2, 3, 1, 4, 2],
    &#039;Quantity': [1, 2, 1, 1, 3, 1],
    &#039;Date': pd.date_range('2024-01', periods=6, freq='W')
})

customers = pd.DataFrame({
    &#039;CustomerID': [1, 2, 3],
    &#039;Name': ['Alice', 'Bob', 'Carol'],
    &#039;City': ['NYC', 'LA', 'Chicago']
})

# Multi-table merge (chain merges)
analysis = (orders
    .merge(products, on=&#039;ProductID')
    .merge(customers, on=&#039;CustomerID')
    .assign(Revenue=lambda df: df[&#039;Quantity'] * df['Price'])
)

print("Order Analysis:")
print(analysis[[&#039;OrderID', 'Name_y', 'Name_x', 'Quantity', 'Revenue', 'Date']])

# Revenue per customer
print("\nRevenue per Customer:")
print(analysis.groupby(&#039;Name_y')['Revenue'].sum())

6. Common Mistakes

Duplicate column names after merge: When both DataFrames have a column 'Name', Pandas adds suffixes: Namex, Namey. Rename before or after merging.

Many-to-many merges create row explosion: Merging on a non-unique key in BOTH DataFrames multiplies rows. Always check key uniqueness before merging.

7. MCQs

Question 1

`pd.concat([df1,df2], axis=0)` does?

Question 2

INNER JOIN keeps?

Question 3

LEFT JOIN keeps?

Question 4

`lefton`/`right``on` is used when?

Question 5

After merge, `Namex` and `Name``y` appear because?

Question 6

`how='outer'` keeps?

Question 7

`ignoreindex=True` in concat?

Question 8

Many-to-many merge risk?

Question 9

`validate='one``toone'` in merge?

Question 10

pd.merge(df1, df2, on='ID') default join type?

8. Interview Questions

Q: What is the difference between merge, join, and concat in Pandas?

Q: How do you perform a left join in Pandas?

9. Summary
concat stacks DataFrames. merge performs SQL-style joins (inner, left, right, outer). lefton/right_on handle different key names. Chain merges for multi-table analysis. Watch for duplicate column names (suffixes) and many-to-many row explosions.

10. Next Chapter Recommendation

In Chapter 18: Working with Dates and Time Series, we master datetime parsing, date-based filtering, resampling, and time series analysis.

Browse All 21+ Subject Areas

Quick Links

Visual Algorithm Labs

Frontend Dev

The Future of Web Architecture in 2026

Practice Quizzes

Merging and Joining DataFrames #

1. Chapter Introduction #

2. pd.concat — Stacking DataFrames #

3. merge — SQL-Style Joins #

4. Merge on Different Column Names #

5. Multi-Table Real-World Example #

6. Common Mistakes #

7. MCQs #

pd.concat([df1,df2], axis=0) does?

INNER JOIN keeps?

LEFT JOIN keeps?

lefton/righton is used when?

After merge, Namex and Namey appear because?

how='outer' keeps?

ignoreindex=True in concat?

Many-to-many merge risk?

validate='onetoone' in merge?

pd.merge(df1, df2, on='ID') default join type?

8. Interview Questions #

9. Summary #

10. Next Chapter Recommendation #

Finish this Chapter

Discussion

Explore More

📖 Related Tutorials 5

Send Feedback / Bug

Feedback Submitted!

Merging and Joining DataFrames

1. Chapter Introduction

2. pd.concat — Stacking DataFrames

3. merge — SQL-Style Joins

4. Merge on Different Column Names

5. Multi-Table Real-World Example

6. Common Mistakes

7. MCQs

`pd.concat([df1,df2], axis=0)` does?

`lefton`/`right``on` is used when?

After merge, `Namex` and `Name``y` appear because?

`how='outer'` keeps?

`ignoreindex=True` in concat?

`validate='one``toone'` in merge?

`pd.merge(df1, df2, on='ID')` default join type?

8. Interview Questions

9. Summary

10. Next Chapter Recommendation