CHAPTER 11 Beginner

Reading and Writing Data Files

Updated: May 18, 2026

5 min read

# CHAPTER 11

Reading and Writing Data Files

1. Chapter Introduction

Real data lives in files — CSVs, Excel sheets, JSON APIs, databases. Pandas' IO functions are the gateway from raw files to analysis-ready DataFrames, supporting 15+ formats with production-grade options.

2. Reading CSV Files

python

1234567891011121314151617181920212223

import pandas as pd

# Basic read
df = pd.read_csv(&#039;sales.csv')

# With options
df = pd.read_csv(
    &#039;sales.csv',
    sep=&#039;,',              # Delimiter (default ',')
    header=0,             # Row to use as column names (0=first row)
    index_col=&#039;ID',       # Column to use as index
    usecols=[&#039;Date', 'Product', 'Revenue'],  # Only these columns
    nrows=1000,           # Read only first 1000 rows
    skiprows=[1, 2],      # Skip specific rows
    na_values=[&#039;N/A', 'null', '-'],  # Treat as NaN
    parse_dates=[&#039;Date'], # Auto-parse date columns
    dtype={&#039;Revenue': float, 'Quantity': int}  # Force dtypes
)

# Quick look
print(df.shape)
print(df.head())
print(df.dtypes)

text

1234567

# Sample sales.csv
Date,Product,Category,Quantity,Revenue,Region
2024-01-15,Laptop,Electronics,2,1800.00,North
2024-01-16,Phone,Electronics,5,3500.00,South
2024-01-17,Desk,Furniture,1,450.00,North
2024-01-18,Chair,Furniture,3,900.00,East
2024-01-19,Monitor,Electronics,2,600.00,West

3. Reading Excel Files

python

1234567891011121314

# Read Excel
df = pd.read_excel(
    &#039;data.xlsx',
    sheet_name=&#039;Sales',    # Sheet name or index
    header=1,              # If header is on row 2
    usecols=&#039;A:F'          # Read columns A to F
)

# Read multiple sheets
all_sheets = pd.read_excel(&#039;data.xlsx', sheet_name=None)
# Returns dict: {'Sheet1': df1, 'Sheet2': df2}

for sheet_name, sheet_df in all_sheets.items():
    print(f"Sheet: {sheet_name}, Shape: {sheet_df.shape}")

4. Reading JSON

python

1234567891011121314151617181920

# From JSON file
df = pd.read_json(&#039;data.json')

# From JSON string (common with APIs)
import json
json_data = &#039;''[
    {"id": 1, "name": "Alice", "salary": 85000},
    {"id": 2, "name": "Bob", "salary": 72000}
]&#039;''
df = pd.read_json(json_data)

# Nested JSON → normalize
nested = {
    "employees": [
        {"name": "Alice", "contact": {"email": "alice@ex.com", "phone": "555-1234"}},
        {"name": "Bob",   "contact": {"email": "bob@ex.com",   "phone": "555-5678"}}
    ]
}
df = pd.json_normalize(nested[&#039;employees'])
print(df.columns)  # name, contact.email, contact.phone

5. Writing Data Files

python

1234567891011121314151617181920

df = pd.DataFrame({
    &#039;Name': ['Alice', 'Bob', 'Carol'],
    &#039;Sales': [45000, 62000, 38000],
    &#039;Region': ['North', 'South', 'East']
})

# Write to CSV
df.to_csv(&#039;output.csv', index=False)          # No row numbers

# Write to Excel
df.to_excel(&#039;output.xlsx', sheet_name='Sales', index=False)

# Write to JSON
df.to_json(&#039;output.json', orient='records', indent=2)

# Write to multiple Excel sheets
with pd.ExcelWriter(&#039;report.xlsx', engine='openpyxl') as writer:
    df.to_excel(writer, sheet_name=&#039;Sales', index=False)
    df.describe().to_excel(writer, sheet_name=&#039;Statistics')
    print("Multi-sheet Excel saved!")

6. Mini Project: CSV Sales Report Analyzer

python

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849

import pandas as pd
import numpy as np

# Create sample CSV data
sales_data = """Date,Product,Category,Quantity,Revenue,Region,Salesperson
2024-01-05,Laptop,Electronics,2,2400,North,Alice
2024-01-06,Phone,Electronics,5,3750,South,Bob
2024-01-07,Desk,Furniture,3,1350,East,Carol
2024-01-08,Laptop,Electronics,1,1200,West,Alice
2024-01-09,Chair,Furniture,4,800,North,David
2024-01-10,Monitor,Electronics,3,900,South,Bob
2024-01-11,Phone,Electronics,2,1500,East,Carol
2024-01-12,Desk,Furniture,2,900,West,David
2024-01-13,Laptop,Electronics,3,3600,North,Alice
2024-01-14,Monitor,Electronics,5,1500,South,Bob"""

with open(&#039;sales_report.csv', 'w') as f:
    f.write(sales_data)

# Load and analyze
df = pd.read_csv(&#039;sales_report.csv', parse_dates=['Date'])

print("=" * 55)
print("SALES ANALYSIS REPORT")
print("=" * 55)

# Overview
print(f"\nData range: {df[&#039;Date'].min().date()} to {df['Date'].max().date()}")
print(f"Total transactions: {len(df)}")
print(f"Total revenue: ${df[&#039;Revenue'].sum():,.0f}")

# Revenue by category
print("\nRevenue by Category:")
cat_rev = df.groupby(&#039;Category')['Revenue'].agg(['sum', 'mean', 'count'])
cat_rev.columns = [&#039;Total', 'Average', 'Transactions']
print(cat_rev)

# Top salesperson
print("\nTop Salesperson:")
sales_rev = df.groupby(&#039;Salesperson')['Revenue'].sum().sort_values(ascending=False)
print(sales_rev)

# Best selling product
print("\nBest Products by Revenue:")
print(df.groupby(&#039;Product')['Revenue'].sum().sort_values(ascending=False))

# Save report
df.to_csv(&#039;analyzed_report.csv', index=False)
print("\n✅ Report saved to &#039;analyzed_report.csv'")

7. Common Mistakes

Forgetting index=False in tocsv(): Without it, Pandas adds a row number column — creating a duplicate index on re-read.

Date columns not auto-parsed: Use parsedates=['Date'] in readcsv() to get proper datetime dtype, not strings.

8. MCQs

Question 1

`read``csv('file.csv', indexcol='ID')` does?

Question 2

`to``csv(index=False)` prevents?

Question 3

`usecols=['A','B']` in readcsv?

Question 4

`nrows=500` in readcsv?

Question 5

`parsedates=['Date']` converts?

Question 6

`pd.json``normalize()` handles?

Question 7

`pd.readexcel(sheetname=None)` returns?

Question 8

`ExcelWriter` context manager is for?

Question 9

`navalues=['N/A', '-']` in readcsv?

Question 10

`dtype={'Revenue': float}` in readcsv?

9. Interview Questions

Q: How do you load only specific columns from a large CSV in Pandas?

Q: How do you write a DataFrame to multiple sheets in a single Excel file?

10. Summary
Pandas IO handles every common data format. Key params: indexcol, usecols, nrows, parsedates, dtype, navalues. Always use index=False when writing CSVs. Use ExcelWriter for multi-sheet Excel reports.

11. Next Chapter Recommendation

In Chapter 12: Data Selection and Filtering, we master loc[], iloc[], boolean filtering, and query syntax to extract exactly the data we need.

Browse All 21+ Subject Areas

Quick Links

Visual Algorithm Labs

Frontend Dev

The Future of Web Architecture in 2026

Practice Quizzes

Reading and Writing Data Files #

1. Chapter Introduction #

2. Reading CSV Files #

3. Reading Excel Files #

4. Reading JSON #

5. Writing Data Files #

6. Mini Project: CSV Sales Report Analyzer #

7. Common Mistakes #

8. MCQs #

readcsv('file.csv', indexcol='ID') does?

tocsv(index=False) prevents?

usecols=['A','B'] in readcsv?

nrows=500 in readcsv?

parsedates=['Date'] converts?

pd.jsonnormalize() handles?

pd.readexcel(sheetname=None) returns?

ExcelWriter context manager is for?

navalues=['N/A', '-'] in readcsv?

dtype={'Revenue': float} in readcsv?

9. Interview Questions #

10. Summary #

11. Next Chapter Recommendation #

Finish this Chapter

Discussion

Explore More

📖 Related Tutorials 5

Send Feedback / Bug

Feedback Submitted!

Reading and Writing Data Files

1. Chapter Introduction

2. Reading CSV Files

3. Reading Excel Files

4. Reading JSON

5. Writing Data Files

6. Mini Project: CSV Sales Report Analyzer

7. Common Mistakes

8. MCQs

`read``csv('file.csv', indexcol='ID')` does?

`to``csv(index=False)` prevents?

`usecols=['A','B']` in readcsv?

`nrows=500` in readcsv?

`parsedates=['Date']` converts?

`pd.json``normalize()` handles?

`pd.readexcel(sheetname=None)` returns?

`ExcelWriter` context manager is for?

`navalues=['N/A', '-']` in readcsv?

`dtype={'Revenue': float}` in readcsv?

9. Interview Questions

10. Summary

11. Next Chapter Recommendation