Skip to content

Installing Python Packages#

Overview#

Once the versions of Python you need are installed, the next step is to install packages. One important difference between R and Python: R packages are typically installed within an active R session, as in,

R Console

> install.packages("dplyr")

By contrast, Python packages are usually installed from the command line using a module1 called pip.

Terminal

$ python -m pip install pandas

Note

Many Python packages can be installed by one name, but are referenced in code via another name. For example, python -m pip install python-dotenv installs the package dotenv, which in code is referenced via import dotenv. Try searching on PyPI if you're unsure about a package name.

As with installing Python, when installing Python packages, you want to do so in a way that makes it easy to work on different projects concurrently. Before you install any packages, the first step is to create a "virtual environment".

The Iron Law of Python Management#

Create a virtual environment for every project.

You can do this by running python -m venv .venv. This executes the python module venv, which creates a virtual environment in the folder .venv/.
.venv is one of a few conventional names that are given to directories containing virtual environments. These directories contain links to a python executable, a copy of pip, and activation scripts:

If Python has been installed according to the Python installation directions, you can use the versions in /opt/python to create a virtual environment for your project:

Terminal

rstudio@e6a5639b8fca:~$ mkdir data-science-project
rstudio@e6a5639b8fca:~$ cd data-science-project
rstudio@e6a5639b8fca:~/data-science-project$ /opt/python/3.7.7/bin/python -m venv .venv
rstudio@e6a5639b8fca:~/data-science-project$ tree -aL 3
.
`-- .venv
    |-- bin
    |   |-- activate
    |   |-- activate.csh
    |   |-- activate.fish
    |   |-- easy_install
    |   |-- easy_install-3.7
    |   |-- pip
    |   |-- pip3
    |   |-- pip3.7
    |   |-- python -> /opt/python/3.7.7/bin/python
    |   `-- python3 -> python
    |-- include
    |-- lib
    |   `-- python3.7
    |-- lib64 -> lib
    `-- pyvenv.cfg

Terminal

WDAGUtilityAccount@mvp MINGW64 ~/Documents
$ mkdir data-science-project

WDAGUtilityAccount@mvp MINGW64 ~/Documents
$ cd data-science-project

WDAGUtilityAccount@mvp MINGW64 ~/Documents/data-science-project
$ python -m venv .venv

WDAGUtilityAccount@mvp MINGW64 ~/Documents/data-science-project
$ tree -aL 3
.
`-- .venv
    |-- Include
    |-- Lib
    |   `-- site-packages
    |-- Scripts
    |   |-- Activate.ps1
    |   |-- activate
    |   |-- activate.bat
    |   |-- deactivate.bat
    |   |-- easy_install-3.9.exe
    |   |-- easy_install.exe
    |   |-- pip.exe
    |   |-- pip3.9.exe
    |   |-- pip3.exe
    |   |-- python.exe
    |   `-- pythonw.exe
    `-- pyvenv.cfg

Once your virtual environment is created, you must then activate your Python virtual environment to isolate your project.

$ source .venv/bin/activate
$ source .venv/Scripts/activate

Your shell may add an indication that you are working in a virtual environment via (.venv).2 Some IDEs may detect that you have created a virtual environment and activate it for you. When your virtual environment is active, which python should return the path to your project. You can call deactivate to return to your shell's default version of Python.

Terminal

rstudio@e6a5639b8fca:~/data-science-project$ source .venv/bin/activate

(.venv) rstudio@e6a5639b8fca:~/data-science-project$ which python
/home/rstudio/data-science-project/.venv/bin/python

(.venv) rstudio@e6a5639b8fca:~/data-science-project$ deactivate
rstudio@e6a5639b8fca:~/data-science-project$ which python

Terminal

WDAGUtilityAccount@mvp MINGW64 ~/Documents/data-science-project
$ source .venv/Scripts/activate

(.venv)
WDAGUtilityAccount@mvp MINGW64 ~/Documents/data-science-project
$ which python
/c/Users/WDAGUtilityAccount/Documents/data-science-project/\Users\WDAGUtilityAccount\Documents\data-science-project\.venv/Scripts/python

(.venv)
WDAGUtilityAccount@mvp MINGW64 ~/Documents/data-science-project
$ deactivate

WDAGUtilityAccount@mvp MINGW64 ~/Documents/data-science-project
$ which python
/c/Users/WDAGUtilityAccount/scoop/apps/pyenv/current/pyenv-win/shims/python

Virtual environment directories should not be checked into version control, so add the location of your virtual environment to your .gitignore.

Installing Python packages#

Once your virtual environment is active, you can begin installing packages. It can sometimes be helpful to start by updating your version of pip, and other packages whose job is to help install packages:

WDAGUtilityAccount@mvp MINGW64 ~/Documents 
$ mkdir data-science-project

WDAGUtilityAccount@mvp MINGW64 ~/Documents 
$ cd data-science-project/

WDAGUtilityAccount@mvp MINGW64 ~/Documents/data-science-project 
$ python -m venv .venv 

WDAGUtilityAccount@mvp MINGW64 ~/Documents/data-science-project 
$ source .venv/Scripts/activate 

(.venv)
WDAGUtilityAccount@mvp MINGW64 ~/Documents/data-science-project (master)
$ python -m pip install -U pip setuptools wheel
...
Successfully installed pip-21.1.2 setuptools-57.0.0 wheel-0.36.2
After that, you can install data science packages:

(.venv) 
WDAGUtilityAccount@mvp MINGW64 ~/Documents/data-science-project 
$ python -m pip install pandas 
Collecting pandas 
  Downloading pandas-1.2.4-cp39-cp39-win_amd64.whl (9.3 MB) 
Collecting python-dateutil>=2.7.3 
  Downloading python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB) 
Collecting numpy>=1.16.5 
  Downloading numpy-1.20.3-cp39-cp39-win_amd64.whl (13.7 MB) 
Collecting pytz>=2017.3 
  Downloading pytz-2021.1-py2.py3-none-any.whl (510 kB) 
Collecting six>=1.5 
  Downloading six-1.16.0-py2.py3-none-any.whl (11 kB) 
Installing collected packages: six, pytz, python-dateutil, numpy, pandas 
Successfully installed numpy-1.20.3 pandas-1.2.4 python-dateutil-2.8.1 pytz-2021.1 six-1.16.0 

You can print a table showing the packages installed in the active virtual environment.

(.venv) 
WDAGUtilityAccount@mvp MINGW64 ~/Documents/data-science-project 
$ pip list 
Package         Version 
--------------- ------- 
numpy           1.20.3 
pandas          1.2.4 
pip             21.1.2 
python-dateutil 2.8.1 
pytz            2021.1 
setuptools      57.0.0 
six             1.16.0 
wheel           0.36.2 
You can also produce a machine-readable version of this list:

(.venv) 
WDAGUtilityAccount@mvp MINGW64 ~/Documents/data-science-project 
$ pip freeze 
numpy==1.20.3 
pandas==1.2.4 
python-dateutil==2.8.1 
pytz==2021.1 
six==1.16.0 

You can redirect3 this machine-readable version to a requirements.txt file.

(.venv) 
WDAGUtilityAccount@mvp MINGW64 ~/Documents/data-science-project 
$ pip freeze > requirements.txt 

(.venv) 
WDAGUtilityAccount@mvp MINGW64 ~/Documents/data-science-project 
$ cat requirements.txt 
numpy==1.20.3 
pandas==1.2.4 
python-dateutil==2.8.1 
pytz==2021.1 
six==1.16.0 
which you should then commit to version control.

You can follow the same steps when collaborating on Python projects.

Clone the project and set up a virtual environment in the project directory:

WDAGUtilityAccount@mvp MINGW64 ~/Documents
$ git clone https://github.com/sol-eng/python-examples
Cloning into 'python-examples'...
...
Resolving deltas: 100% (351/351), done.

WDAGUtilityAccount@mvp MINGW64 ~/Documents
$ cd python-examples/flask-restx/

WDAGUtilityAccount@mvp MINGW64 ~/Documents/python-examples/flask-restx (master)
$ python -m venv .venv

WDAGUtilityAccount@mvp MINGW64 ~/Documents/python-examples/flask-restx (master)
$ source .venv/Scripts/activate

pip install the dependencies from the requirements.txt file:

(.venv)
WDAGUtilityAccount@mvp MINGW64 ~/Documents/python-examples/flask-restx (master)
$ pip install -r requirements.txt
Collecting flask-restx
  Downloading flask_restx-0.4.0-py2.py3-none-any.whl (5.3 MB)
Collecting sklearn
  Downloading sklearn-0.0.tar.gz (1.1 kB)
Collecting Flask<2.0.0,>=0.8
  Downloading Flask-1.1.4-py2.py3-none-any.whl (94 kB)
...
Successfully installed Flask-1.1.4 Jinja2-2.11.3 MarkupSafe-2.0.1 aniso8601-9.0.1 attrs-21.2.0 click-7.1.2 flask-restx-0.4.0 itsdangerous-1.1.0 joblib-1.0.1 jsonschema-3.2.0 numpy-1.20.3 pyrsistent-0.17.3 pytz-2021.1 scikit-learn-0.24.2 scipy-1.6.3 six-1.16.0 sklearn-0.0 threadpoolctl-2.1.0 werkzeug-1.0.1
(.venv)
WDAGUtilityAccount@mvp MINGW64 ~/Documents/python-examples/flask-restx (master)
$ pip list
Package       Version
------------- -------
aniso8601     9.0.1
attrs         21.2.0
click         7.1.2
Flask         1.1.4
flask-restx   0.4.0
itsdangerous  1.1.0
Jinja2        2.11.3
joblib        1.0.1
jsonschema    3.2.0
MarkupSafe    2.0.1
numpy         1.20.3
pip           20.2.3
pyrsistent    0.17.3
pytz          2021.1
scikit-learn  0.24.2
scipy         1.6.3
setuptools    49.2.1
six           1.16.0
sklearn       0.0
threadpoolctl 2.1.0
Werkzeug      1.0.1

If you run pip freeze and see a number of Python dependencies that you don't remember installing that have nothing to do with your project, you have probably forgotten to activate the virtual environment for your project.

Reticulate#

Create a virtual environment in the folder containing your reticulated R project. Add a .Renviron file to your R project where RETICULATE_PYTHON is set to the path of the Python executable in the virtual environment directory. R will read this file when a new session starts and add it to your list of environment variables. Add a line to your .gitignore file for your .Renviron.

Terminal

WDAGUtilityAccount@mvp MINGW64 ~/Documents
$ mkdir my-reticulated-project

WDAGUtilityAccount@mvp MINGW64 ~/Documents
$ cd my-reticulated-project

WDAGUtilityAccount@mvp MINGW64 ~/Documents/my-reticulated-project
$ python -m venv .venv

WDAGUtilityAccount@mvp MINGW64 ~/Documents/my-reticulated-project
$ echo "RETICULATE_PYTHON=.venv/bin/python" >> .Renviron

WDAGUtilityAccount@mvp MINGW64 ~/Documents/my-reticulated-project
$ echo ".Renviron" >> .gitignore

To confirm that this is set correctly, retrieve the value of RETICULATE_PYTHON from the R console.

R

> Sys.getenv("RETICULATE_PYTHON")
[1] ".venv/bin/python"

A closing xkcd#

Virtual environments are like git: if you make a mistake, you can always start over.
blow it away and start a new one.


  1. You may also see this written as simply pip install pandas

  2. Installing a helper program like starship can make it easier to keep track of whether a virtual environment is active. 

  3. https://devhints.io/bash#redirection