Package Management with Conda and Pip

Anaconda vs Miniconda vs “conda”

Anaconda is a free and open-source distribution of the Python programming language for scientific computing. Anaconda includes a wide selection of Python packages that are installed by default, with the ability to install more packages using the “conda” package manager program.

Miniconda is a lightweight implementation of the Anaconda distribution that provides the “conda” package manager, but does not include the large collection of scientific Python packages installed by default like Anaconda does.

“conda” is simply the package and environment manager program that allows new software to be installed. The “conda” program is available whether you choose to install Anaconda or Miniconda.

pip

Pip is a more basic package manager than conda that allows you to install software from PyPI (Python Package Index) as well as from GitHub. It works particularly well for pure Python packages, but things can get complicated when compiled code and external (non-Python) dependencies are involved.

Not all packages are available on conda, so pip is still useful even if you’re primarily using conda. All conda environments that have Python installed should also include pip by default.

pip install -e 'git+https://github.com/NCAR/esmlab.git#egg=esmlab'

-e installs in “editable” mode

git+ at the beginning of a URL installs from a git repository

Installing and testing conda

Download and run the conda installer script

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
sh ./Miniconda3-latest-MacOSX-x86_64.sh

OR

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh ./Miniconda3-latest-Linux-x86_64.sh

Then open a new terminal and check to ensure the conda program exists:

conda --version
which conda
conda 4.6.14
/Users/hallock/miniconda3/bin/conda

conda gotchas

Note that the conda installation instructions recommend running conda init {shell}, but that this will likely result in whichever Python installation was previously used by default being overridden by the new conda-provided Python. The safest way to install conda without interfering with existing Python installs would be to add the directory /path/to/miniconda/condabin to your PATH environment variable, which will provide just the conda program but not python.

Behavior with shells other than bash (tcsh in particular) is a bit inconsistent. conda activate does not seem to work properly in tcsh, but you could manually set your PATH environment variable to include the appropriate environment’s bin directory.

conda “channels”

“Conda packages are downloaded from remote channels, which are URLs to directories containing conda packages. The conda command searches a default set of channels, and packages are automatically downloaded and updated from https://repo.anaconda.com/pkgs/. You can modify what remote channels are automatically searched. You might want to do this to maintain a private or internal channel. For details, see how to modify your channel lists.” - conda documentation

Generally speaking, conda “channels” are intended to provide packages that are guaranteed to be compatible with each other. Mixing and matching packages between channels is a common source of frustration for users, so using a single source for all of your packages is generally preferred, if possible.

“conda-forge”

The conda-forge channel is a community led collection of recipes and packages. As of June 4, 2019, There are 6862 repositories (nearly all of which represent unique conda packages) and 1373 members in the conda-forge organization on GitHub.

I usually recommend configuring conda to use conda-forge by default:

conda config --add channels conda-forge
Warning: 'conda-forge' already in 'channels' list, moving to the top

Installing packages

conda activate base
conda install -y python=3
conda install -y numpy xarray
(base) Collecting package metadata: done
Solving environment: done

# All requested packages already installed.

(base) Collecting package metadata: done
Solving environment: done

# All requested packages already installed.

(base) ```
</div>
</div>
<div class="output_wrapper" markdown="1">
<div class="output_subarea" markdown="1">
{:.output_traceback_line}
</div>
</div>
</div>

# Upgrading packages

<div markdown="1" class="cell code_cell">
<div class="input_area" markdown="1">
```bash
conda activate base
conda update -y python numpy xarray
(base) Collecting package metadata: done
Solving environment: done

# All requested packages already installed.

(base) ```
</div>
</div>
<div class="output_wrapper" markdown="1">
<div class="output_subarea" markdown="1">
{:.output_traceback_line}
</div>
</div>
</div>

# Conda environments

By default, conda operates in the `base` "environment". However, this means that every package installed in the `base` environment must be compatible with each other, even if they are not all used for the same projects. Installing packages into separate environments for each project/task prevents any possible collision between packages.

`conda env create -f /path/to/environment.yml # .yml file contains env name and packages to be installed`

<div markdown="1" class="cell code_cell">
<div class="input_area" markdown="1">
```bash
conda create -y --name env1 python=2.7 numpy >/dev/null
conda create -y --name env2 python=3 numpy xarray >/dev/null
(base) (base) ```
</div>
</div>
<div class="output_wrapper" markdown="1">
<div class="output_subarea" markdown="1">
{:.output_traceback_line}
</div>
</div>
</div>

# Conda environment demo

<div markdown="1" class="cell code_cell">
<div class="input_area" markdown="1">
```bash
conda activate env1
python -c 'from __future__ import print_function;import numpy, platform;print("python version: %s" % platform.python_version());print("numpy version: %s" % numpy.__version__);import xarray;print("xarray version: %s" % xarray.__version__)'
conda deactivate
(env1) python version: 2.7.15
numpy version: 1.16.4
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named xarray
(env1) (base) ```
</div>
</div>
<div class="output_wrapper" markdown="1">
<div class="output_subarea" markdown="1">
{:.output_traceback_line}
</div>
</div>
</div>

<div markdown="1" class="cell code_cell">
<div class="input_area" markdown="1">
```bash
conda activate env2
python -c 'from __future__ import print_function;import numpy, platform;print("python version: %s" % platform.python_version());print("numpy version: %s" % numpy.__version__);import xarray;print("xarray version: %s" % xarray.__version__)'
conda deactivate
(env2) python version: 3.7.3
numpy version: 1.16.4
xarray version: 0.12.1
(env2) (base) ```
</div>
</div>
<div class="output_wrapper" markdown="1">
<div class="output_subarea" markdown="1">
{:.output_traceback_line}
</div>
</div>
</div>

# Fixing a broken environment

<div markdown="1" class="cell code_cell">
<div class="input_area" markdown="1">
```bash
conda deactivate
conda env remove -n broken >/dev/null 2>&1
conda env create -f broken.yml >/dev/null 2>&1 # this will fail because there is no broken.yml file included
conda activate broken
echo "broken NCL..."
ncl -V
(broken) broken NCL...
(broken) dyld: Library not loaded: @rpath/libpoppler.71.dylib
  Referenced from: /Users/hallock/miniconda3/envs/broken/lib/libgdal.20.dylib
  Reason: image not found
Abort trap: 6
(broken) ```
</div>
</div>
<div class="output_wrapper" markdown="1">
<div class="output_subarea" markdown="1">
{:.output_traceback_line}
</div>
</div>
</div>

<div markdown="1" class="cell code_cell">
<div class="input_area" markdown="1">
```bash
echo 'running "conda update" to fix NCL'
conda update --all -y >/dev/null 2>&1
echo "fixed NCL"
ncl -V
conda deactivate
running "conda update" to fix NCL
(broken) (broken) fixed NCL
(broken) 6.6.2
(broken) ```
</div>
</div>
</div>

# Reproducible Science

Backup a working production environment using `conda create` with the `--clone` option, update/install packages as needed, and then test the clone environment to ensure everything still works as expected. Once the clone environment has been verified, `conda env remove` the original environment and clone the new environment back to the original name, and verify that everything is still working.

<div markdown="1" class="cell code_cell">
<div class="input_area" markdown="1">
```bash
conda create -y --name original_env python=3 numpy xarray >/dev/null
conda activate original_env
python -c 'from __future__ import print_function;import numpy, platform;print("python version: %s" % platform.python_version());print("numpy version: %s" % numpy.__version__);import xarray;print("xarray version: %s" % xarray.__version__);import dask;print("dask version: %s" % dask.__version__)'
conda deactivate
(original_env) python version: 3.7.3
numpy version: 1.16.4
xarray version: 0.12.1
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'dask'
(original_env) ```
</div>
</div>
</div>

### Clone original environment to temporary environment

<div markdown="1" class="cell code_cell">
<div class="input_area" markdown="1">
```bash
conda create -y --name temp_env --clone original_env >/dev/null
conda activate temp_env
conda update -y --all # update any packages
conda install -y dask >/dev/null
# run tests
python -c 'from __future__ import print_function;import numpy, platform;print("python version: %s" % platform.python_version());print("numpy version: %s" % numpy.__version__);import xarray;print("xarray version: %s" % xarray.__version__);import dask;print("dask version: %s" % dask.__version__)'
conda deactivate
(temp_env) Collecting package metadata: done
Solving environment: done

# All requested packages already installed.

(temp_env) (temp_env) (temp_env) python version: 3.7.3
numpy version: 1.16.4
xarray version: 0.12.1
dask version: 1.2.2
(temp_env) ```
</div>
</div>
</div>

### Remove original environment, replace with clone of temporary environment

<div markdown="1" class="cell code_cell">
<div class="input_area" markdown="1">
```bash
conda env remove --name original_env
conda create --name original_env --clone temp_env >/dev/null
conda activate original_env
# run tests again
python -c 'from __future__ import print_function;import numpy, platform;print("python version: %s" % platform.python_version());print("numpy version: %s" % numpy.__version__);import xarray;print("xarray version: %s" % xarray.__version__);import dask;print("dask version: %s" % dask.__version__)'
conda env remove --name temp_env
conda deactivate

```

Remove all packages in environment /Users/hallock/miniconda3/envs/original_env:

(original_env) (original_env) python version: 3.7.3 numpy version: 1.16.4 xarray version: 0.12.1 dask version: 1.2.2 (original_env) Remove all packages in environment /Users/hallock/miniconda3/envs/temp_env:

(original_env) ```