Skip to main content
Python for Data Science
CHAPTER 09 Beginner

File Handling in Python

Updated: May 18, 2026
5 min read

# CHAPTER 9

File Handling in Python

1. Chapter Introduction

As a Data Scientist, your data rarely lives directly inside your Python script. It lives in files on your hard drive or in the cloud. You must know how to ingest (read) this data and export (write) your results. This chapter covers standard Python file I/O (Input/Output), focusing on plain text, CSVs (Comma Separated Values), and JSON files.

2. Reading Text Files

Python has a built-in open() function. We use the with statement because it automatically closes the file when the block finishes, preventing memory leaks and locked files.

python
12345678910
# 'r' stands for Read mode
with open('data.txt', 'r') as file:
    # Read the entire file into a single string
    content = file.read()
    print(content)

# Alternatively, read line by line (better for massive files)
with open('data.txt', 'r') as file:
    for line in file:
        print(line.strip()) # strip() removes the invisible newline character

3. Writing Text Files

To save your results, you open a file in Write ('w') or Append ('a') mode.

python
123456789
report = "Analysis Complete: 500 rows processed."

# 'w' mode (Write): OVERWRITES the file completely
with open('report.txt', 'w') as file:
    file.write(report)
    
# 'a' mode (Append): Adds to the end of the file
with open('report.txt', 'a') as file:
    file.write("\nNo errors detected.")

4. Working with CSV Files

While you will eventually use Pandas for CSVs, it is important to know how the built-in csv module works for basic scripting.

python
12345678910111213141516171819
import csv

# Writing a CSV
header = ['Name', 'Age', 'Department']
data = [
    ['Alice', 28, 'Sales'],
    ['Bob', 35, 'IT']
]

with open('employees.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(header)
    writer.writerows(data)

# Reading a CSV
with open('employees.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row) # Outputs lists: ['Alice', '28', 'Sales']

5. Working with JSON Files

JSON (JavaScript Object Notation) is the standard format for web data. It looks exactly like Python nested Dictionaries and Lists. The json module translates JSON strings into Python Dictionaries.

python
123456789101112131415161718
import json

# Python Dictionary
user_data = {
    "id": 101,
    "name": "Jane",
    "is_active": True
}

# 1. Write Dictionary to a JSON file
with open('user.json', 'w') as file:
    # json.dump serializes the dictionary to text
    json.dump(user_data, file, indent=4)

# 2. Read JSON file back into a Dictionary
with open('user.json', 'r') as file:
    loaded_data = json.load(file)
    print(loaded_data['name']) # Output: Jane

6. File Paths

You must tell Python where the file is.

  • Relative Path: Looks relative to where the script is running. (e.g., data/sales.csv). *This is preferred!*
  • Absolute Path: The full path (e.g., C:/Users/Name/Documents/data/sales.csv).

python
1234567
import os

# Check if a file exists before trying to open it
if os.path.exists("employees.csv"):
    print("File found!")
else:
    print("File missing!")

7. Mini Project: Notes Manager System

Let's build a simple script that logs daily notes and timestamps them.

python
1234567891011121314151617181920212223242526
import os
from datetime import datetime

filename = "daily_log.txt"

def add_log(note):
    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    log_entry = f"[{timestamp}] {note}\n"
    
    # Use 'a' to append, so we don't overwrite yesterday's notes
    with open(filename, 'a') as file:
        file.write(log_entry)
    print("Log saved.")

def view_logs():
    if not os.path.exists(filename):
        print("No logs found.")
        return
        
    with open(filename, 'r') as file:
        print("--- DAILY LOGS ---")
        print(file.read())

# Simulate usage
add_log("Started data cleaning phase.")
view_logs()

8. Common Mistakes

  • FileNotFoundError: You type open('data.csv'), but your script is in a different folder. Always double-check your Current Working Directory (os.getcwd()).
  • Overwriting Data: Opening an existing file in 'w' mode instantly wipes it clean. Use 'a' if you want to keep the historical data and add to it.
  • Forgetting newline='' in CSVs: On Windows, writing CSVs without newline='' often results in blank rows between every line of data.

9. MCQs

Question 1

What does the with keyword do when opening files?

Question 2

Which mode should you use in open() to completely overwrite an existing file?

Question 3

Which mode should you use to add text to the bottom of an existing file?

Question 4

What built-in Python module is used to handle Comma Separated Values?

Question 5

JSON data visually looks exactly like which Python data structure?

Question 6

What does json.load() do?

Question 7

What is a "Relative Path"?

Question 8

How can you check if a file exists before opening it to avoid a crash?

Question 9

When reading a file line-by-line using a for loop, what method removes the invisible newline character (\n)?

Question 10

What happens if you forget to close a file (i.e., not using with)?

10. Interview Questions

  • Q: Explain the difference between opening a file in 'w' mode versus 'a' mode.
  • Q: When sharing code with a team, why is it better to use relative paths (data/file.csv) instead of absolute paths (C:/Users/Dave/data/file.csv)?

11. Summary

File I/O is a foundational skill. Use with open('file', 'mode') to ensure files are safely closed. Use 'r' to read, 'w' to overwrite, and 'a' to append. For structured data, utilize the built-in csv and json modules. Always design your scripts using relative paths so they work on any computer, and use os.path.exists() to write defensive code that doesn't crash if a file is missing.

12. Next Chapter Recommendation

In Chapter 10: Introduction to NumPy, we transition from standard Python programming into true data science, learning the foundational math engine that powers high-performance data arrays.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·