Skip to main content
Jupyter Notebooks
CHAPTER 10 Beginner

Data Analysis with Pandas in Jupyter

Updated: May 18, 2026
5 min read

# CHAPTER 10

Data Analysis with Pandas in Jupyter

1. Chapter Introduction

Python's built-in file handling is great for basic text, but terrible for massive spreadsheets. Enter Pandas. Pandas is a third-party Python library that provides the DataFrame—a powerful, programmable spreadsheet. Jupyter and Pandas are a match made in heaven because Jupyter formats Pandas DataFrames into beautiful, interactive HTML tables automatically.

2. Installing and Importing Pandas

If you installed Anaconda, Pandas is already included. If not, run !pip install pandas in a cell.

Cell 1:

python
12
import pandas as pd
# 'pd' is the universal industry standard alias for pandas

3. The Pandas DataFrame

A DataFrame is a 2-dimensional labeled data structure. Think of it exactly like an Excel spreadsheet or a SQL table.

Cell 2:

python
12345678910111213
# Creating a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'Department': ['Sales', 'IT', 'HR', 'IT'],
    'Salary': [55000, 75000, 60000, 80000]
}

df = pd.DataFrame(data)

# In Jupyter, simply typing 'df' at the end of the cell 
# renders a beautiful, styled HTML table!
df

4. Reading Data from a CSV

In the real world, you don't type data out by hand. You load it from a CSV (Comma Separated Values) file.

Cell 3:

python
123456
# Load a CSV file into a DataFrame
# (Assuming 'employees.csv' is in your folder)
# df = pd.read_csv('employees.csv')

# Look at the first 3 rows
df.head(3)

Jupyter Pro-Tip: If your dataset has 10,000 rows, do NOT just type df and hit Enter. Jupyter will try to render the massive table and might freeze your browser. Always use df.head() to preview the first 5 rows, or df.tail() to see the last 5.

5. Data Inspection

Once data is loaded, you must inspect it to understand its shape and data types.

Cell 4:

python
12345678
# Check how many rows and columns (Rows, Columns)
print("Shape:", df.shape)

# Get a summary of data types and non-null values
df.info()

# Get statistical summaries of numeric columns (mean, min, max, etc.)
df.describe()

6. Filtering and Querying Data

Pandas allows you to slice and dice your data easily.

Cell 5:

python
12345678
# Select a single column (returns a Pandas Series)
ages = df['Age']

# Filter rows based on a condition (e.g., IT department only)
it_staff = df[df['Department'] == 'IT']

# Display the filtered data
it_staff

7. Basic Data Cleaning

Data is rarely perfect. Pandas provides tools to handle missing values (NaN).

Cell 6:

python
123456789101112
import numpy as np

# Let's introduce a missing value for demonstration
df.loc[1, 'Salary'] = np.nan
print("Data with missing salary:")
display(df) # display() forces Jupyter to render the HTML table inside print-like logic

# Option A: Fill missing values with a default number
df_filled = df.fillna(0)

# Option B: Drop any row that contains a missing value
df_dropped = df.dropna()

8. Mini Project: Student Analytics Notebook

Create a new notebook and run this workflow.

Cell 1:

python
1234567
import pandas as pd
# Load student grades
grades = pd.DataFrame({
    'Student': ['Tom', 'Jane', 'Bill', 'Sara'],
    'Math': [88, 92, 79, 95],
    'Science': [85, 90, 82, 98]
})

Cell 2:

python
123
# Calculate Average Grade per student
grades['Average'] = (grades['Math'] + grades['Science']) / 2
grades

Cell 3:

python
1234
# Find Top Performers (Average > 90)
top_students = grades[grades['Average'] > 90]
print("Students eligible for honors:")
display(top_students)

9. Common Mistakes

  • Printing DataFrames: Beginners often write print(df). This outputs ugly, raw text aligned with spaces. Instead, use display(df) or simply leave df on the last line of the cell. Jupyter will render it as a styled HTML table.
  • Forgetting the alias: Always import pandas as pd. If you just import pandas, you have to type pandas.DataFrame every time instead of pd.DataFrame, which gets tedious.

10. MCQs

Question 1

What is the standard industry alias for importing Pandas?

Question 2

What is a Pandas DataFrame?

Question 3

How do you load a CSV file into a DataFrame?

Question 4

To view only the first 5 rows of a large dataset to prevent Jupyter from freezing, you use:

Question 5

Which function provides a statistical summary (mean, min, max) of numeric columns?

Question 6

How do you extract a single column named 'Age' from the DataFrame?

Question 7

What does df.shape return?

Question 8

Why is it better to type df on the last line of a cell instead of print(df)?

Question 9

If you want to force Jupyter to render the beautiful HTML table in the *middle* of a code cell (not just the last line), you use?

Question 10

What does df.dropna() do?

11. Interview Questions

  • Q: Explain the difference between a Pandas Series and a Pandas DataFrame.
  • Q: You load a CSV into Jupyter and it has 5 million rows. What commands do you run first to understand the data without crashing the notebook?

12. Summary

Pandas is the engine of data science in Python. It provides the DataFrame for tabular data. By combining Pandas with Jupyter Notebooks, you get a highly visual, interactive spreadsheet environment. Use pd.read_csv() to load data, df.head() to preview it, and display(df) to ensure Jupyter renders it beautifully.

13. Next Chapter Recommendation

In Chapter 11: NumPy Integration in Jupyter, we will explore the math engine that powers Pandas under the hood, enabling lightning-fast calculations on massive datasets.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·