File Handling and Notebook Management
# CHAPTER 9
File Handling and Notebook Management
1. Chapter Introduction
Data science is ultimately about processing files. You read data in from a file, analyze it, and write the results to a new file. This chapter covers the basics of Python file I/O (Input/Output), understanding file paths within the Jupyter environment, and how Jupyter manages your notebook files using Checkpoints.2. Understanding File Paths in Jupyter
When you launch Jupyter, it opens a specific directory (folder) on your computer. This is your Current Working Directory (CWD).
Cell 1:
When you try to open a file, you must tell Jupyter where it is.
-
Relative Path: Looks for the file *relative* to where the notebook is saved. (e.g.,
data.csvassumes the file is in the exact same folder as the notebook).
-
Absolute Path: The full path from the root of your hard drive. (e.g.,
C:/Users/Name/Documents/data.csv).
*Best Practice:* Always keep your data files in a folder named data right next to your notebook, and use relative paths like data/my_file.csv. This ensures your code works if you email the project to a colleague.
3. Reading from Text Files
Python has a built-in open() function to read files. We use the with statement because it automatically closes the file when we are done, freeing up computer memory.
Cell 2:
*(Assuming you have a file named sample.txt in the same folder)*
4. Writing to Text Files
To save your analytical results, you write to a file.
Cell 3:
5. Jupyter Notebook Files (.ipynb)
When you save a notebook, it is saved as a .ipynb file.
If you try to open this file in a normal text editor (like Notepad), you will see a massive, messy JSON file.
Do not edit .ipynb files in a standard text editor! It will corrupt the notebook. You can only safely open and edit them inside the Jupyter interface or VS Code.
6. The Power of Checkpoints
Jupyter auto-saves your notebook every 120 seconds. However, it also has a feature called Checkpoints.
When you click the "Save" icon (or press Ctrl+S), Jupyter explicitly creates a Checkpoint. A checkpoint is a hidden backup of your file at that exact moment.
How to use Checkpoints: If you write some code that breaks your notebook, and you want to go back in time:
- 1. Go to the top menu.
- 2. Click File -> Revert to Checkpoint.
- 3. Select the timestamp you want to revert to.
*Note:* Jupyter only keeps the *single most recent* checkpoint by default.
7. Mini Project: File Analyzer
Let's build a quick notebook tool that reads a text file and counts the words.
Cell 4:
8. Common Mistakes
-
FileNotFoundError: You type
open('data.csv'), but you get an error. 99% of the time, the file is in your "Downloads" folder, but your Jupyter Notebook is saved in your "Documents" folder. Move the CSV to the exact same folder as the notebook.
-
Forgetting to close files: If you don't use the
with open()syntax and instead dof = open('file.txt'), the file remains "locked" in memory. Other programs won't be able to edit it.
9. MCQs
What is the Current Working Directory (CWD) in Jupyter?
Which path format is best for sharing projects with colleagues?
What does the with keyword do when opening files in Python?
If you open a file using mode 'w' (Write), what happens if the file already exists?
Which mode should you use to add a new line of text to the end of an existing file?
What format are Jupyter Notebook files saved in under the hood?
.ipynb file using Windows Notepad? a) Yes b) No, modifying the raw JSON structure will likely corrupt the notebook — Answer: b
What happens when you manually click the "Save" icon in Jupyter?
How do you recover a notebook to its last saved state if you make a terrible mistake?
What Python module allows you to check your current working directory?
10. Interview Questions
- Q: Explain the difference between an Absolute Path and a Relative Path. Why are relative paths preferred in data science projects?
-
Q: Why is it dangerous to open a file in
'w'mode? What should you use instead if you want to keep historical log data?
11. Summary
File management in Jupyter revolves around understanding your Working Directory. Keep your.ipynb notebook and your data files in the same folder, and use Relative Paths to access them. Use Python's with open('file', 'r') syntax to safely read and write data. Finally, utilize Jupyter's manual "Save" button to create Checkpoints, allowing you to rollback your code if you make a mistake.