Installing Python and Data Science Environment
# CHAPTER 2
Installing Python and Data Science Environment
1. Chapter Introduction
Before you can analyze data, you need to set up your workshop. Data scientists do not use standard text editors like Notepad. We use specialized environments that can handle massive datasets, execute code interactively, and manage hundreds of third-party libraries. This chapter guides you through installing Python via Anaconda, setting up Jupyter Notebooks, and using VS Code.2. The Anaconda Distribution (Recommended)
Python on its own is just a bare-bones programming language. If you install standard Python from python.org, you will have to manually install Pandas, NumPy, and Jupyter one by one via the command line.
Anaconda is the industry-standard distribution for Data Science. It is a massive package that pre-installs Python *and* the top 100 data science libraries simultaneously.
Step-by-step Installation:
- 1. Go to anaconda.com/download.
- 2. Download the installer for Windows, Mac, or Linux.
- 3. Run the installer. Leave all settings on their defaults.
- 4. Once installed, search your computer for Anaconda Navigator and open it.
3. Launching Jupyter Notebook
Jupyter Notebook is the primary tool you will use to write Python code in this course. It runs in your web browser but processes data locally on your computer.
- 1. Open Anaconda Navigator.
- 2. Find the tile labeled Jupyter Notebook.
- 3. Click Launch.
-
4.
A black terminal window will open (do not close this!). Then, a new tab will open in your web browser (usually at
localhost:8888). This is your Jupyter file browser.
4. VS Code: The Professional IDE
While Jupyter Notebook in the browser is great for beginners, most professional Data Scientists and Machine Learning Engineers eventually migrate to Visual Studio Code (VS Code). It is a powerful, free code editor built by Microsoft.
How to integrate VS Code with Data Science:
- 1. Download VS Code from code.visualstudio.com.
- 2. Open VS Code, go to the Extensions tab (square boxes icon on the left).
- 3. Search for and install the Python extension.
- 4. Search for and install the Jupyter extension.
-
5.
You can now create files ending in
.ipynband run Jupyter notebooks directly inside VS Code!
5. Managing Libraries with pip and conda
If you need a library that Anaconda didn't install, you must download it from the internet using a package manager. You have two options: pip (Python's default) and conda (Anaconda's default).
To install a package, open your Anaconda Prompt (Windows) or Terminal (Mac) and type:
*Note: In Jupyter Notebook, you can run these commands directly in a code cell by adding an exclamation mark: !pip install seaborn.*
6. Virtual Environments
Imagine Project A needs Pandas version 1.0, but Project B requires Pandas version 2.0. If you only have one installation of Python on your computer, you can't run both projects.
Virtual Environments solve this. They create an isolated, "fenced-off" version of Python for each project.
Creating an environment with Conda:
7. Common Mistakes
- Installing Multiple Pythons: Beginners often install Python from python.org, then install Anaconda, then install Python from the Windows Store. This creates massive conflicts. Choose ONE ecosystem (Anaconda is best) and stick to it.
- Closing the Jupyter Terminal: When you launch Jupyter, a scary-looking black terminal window stays open in the background. If you close it, your notebook will disconnect and stop working. Minimize it, don't close it!
8. MCQs
What is Anaconda?
Where does Jupyter Notebook natively open its interface?
What must you install in VS Code to run data science notebooks?
What command do you use to install a new Python library using the default Python package manager?
Why do data scientists use Virtual Environments?
If you want to run a terminal command (like pip install) directly inside a Jupyter Notebook cell, you prefix it with?
What happens if you close the background terminal window that launched Jupyter Notebook?
What file extension is used for Jupyter Notebooks?
Which IDE is currently the most popular choice for professional Python developers?
How do you activate a Conda virtual environment named 'project1'?
9. Interview Questions
- Q: Explain the concept of Virtual Environments. Why are they a critical best practice in professional data science teams?
-
Q: What is the difference between writing code in a standard
.pyscript versus a.ipynbJupyter Notebook?
10. Summary
Setting up your environment correctly is critical. Use the Anaconda Distribution to get Python, Jupyter, and essential libraries installed simultaneously. For your code editor, start with Jupyter Notebook in the browser, and eventually transition to VS Code. When working on real projects, always use Virtual Environments (conda create) to keep your library dependencies organized and prevent version conflicts.