Working with CSV, Excel, and JSON Files
# CHAPTER 4
Working with CSV, Excel, and JSON Files
1. Chapter Introduction
Before you can clean data, you must be able to load it. Data rarely comes in perfectly formatted packages; it lives in messy CSVs with weird encodings, multi-sheet Excel files, and deeply nested JSON APIs. This chapter covers how to ingest and export these common formats using Pandas.2. Reading CSV Files
CSVs (Comma Separated Values) are the most common format for tabular data.
3. Reading Excel Files
Excel files (.xlsx, .xls) often contain multiple sheets, merged cells, and formatted headers.
4. Reading JSON Files
JSON (JavaScript Object Notation) is heavily used in web APIs. It can be flat or deeply nested.
5. Exporting Cleaned Data
After cleaning, you need to save the data.
6. Mini Project: CSV Cleanup Utility
7. Common Mistakes
-
Forgetting
index=Falsewhen saving CSVs: Pandas will write the row numbers (0, 1, 2...) as a new column every time you save. If you load and save the file 3 times, you'll have 3 useless "Unnamed: 0" columns.
-
Ignoring encoding errors: If a CSV contains accents (é, ñ) and won't load, beginners often delete the bad rows manually. Use
encoding='latin1'orencoding='utf-8-sig'instead.
8. MCQs
Which pandas function is used to read a CSV file?
If your CSV file uses tabs instead of commas, what parameter do you use?
How do you skip the first 2 rows when reading an Excel file?
What is jsonnormalize used for?
Why is index=False important in tocsv()?
What does onbadlines='skip' do in readcsv?
How do you read ALL sheets from an Excel file at once?
Which parameter allows you to treat strings like "Missing" as NaNs when loading?
What is a common fix for a UnicodeDecodeError when reading a CSV?
How do you output JSON as a clean array of objects?
9. Interview Questions
- Q: You are loading a 50GB CSV file that doesn't fit in RAM. How do you handle it using Pandas?
- Q: Explain how you would flatten a JSON API response where one column contains lists of dictionaries.
10. Summary
Pandas provides robust tools for loading data.readcsv() handles text files (use sep, encoding, onbadlines). readexcel() handles spreadsheets (use sheetname, skiprows). readjson() and jsonnormalize() handle nested web data. When exporting, always use index=False to keep files clean.