How to Install Python Packages
Creating and saving a Nextjournal environment
Each Nextjournal code cell runs in a runtime, and each runtime has an environment, which is a Docker container with its own filesystem. In any environment we can install whatever system or language packages we need, modify configuration files, and set up directory and data file structures however we like. Then, we can save and export the environment as a whole for future reproducibility, as well as use by others.
Let's configure an environment for mapmaking, with the geoplot package. We'll install packages in a runtime we name geoplot, and set it to use Nextjournal's default Python environment. This Python 3 environment—as well as its Python 2 counterpart—has a variety of packages installed, including numpy, matplotlib, and plotly:
pip freeze
When we need additional Python packages, they can be installed in multiple ways. The easiest way is to use conda
, which will attempt to install all packages and dependencies in a consistent manner, including system packages and libraries. The Anaconda Cloud has a searchable database of packages and channels—by default we will select only from the anaconda channel.
conda install -y descartes pysal
If we need a different version or esoteric package, we can add other channels.
conda install -y -c conda-forge cartopy
For packages and versions unavailable via conda, or for installing packages in wheel files, pip
is available. For any packages that require compilation, we can install gcc
first.
apt-get update > /dev/null apt-get install -y gcc pip install quilt
We can also use pip
to install development versions off of github, though we have to install git
first.
apt-get install -y git pip install git+https://github.com/geopandas/geopandas
Finally, if a package has a setup.py
, we can download and install with that.
git clone https://github.com/ResidentMario/geoplot cd geoplot python setup.py install
Once everything is set up to our liking, we can save and export the runtime's end state as a new environment using its configuration panel. Using the saved geoplot environment as our Main runtime's environment then ensures that the versions of programs and packages that the article is developed on will be preserved for future reproducibility, even through a Remix. Once the article is published, our exported environment will also be available for other articles to use via transclusion.
Here's an example from the geoplot gallery:
quilt install ResidentMario/geoplot_data
# Load the data (uses the `quilt` package). import geopandas as gpd from quilt.data.ResidentMario import geoplot_data continental_cities = gpd.read_file( geoplot_data.usa_cities()).query('POP_2010 > 100000') continental_usa = gpd.read_file(geoplot_data.contiguous_usa()) # Plot the figure. import geoplot as gplt import geoplot.crs as gcrs import matplotlib.pyplot as plt poly_kwargs = {'linewidth': 0.5, 'edgecolor': 'gray', 'zorder': -1} point_kwargs = {'linewidth': 0.5, 'edgecolor': 'black', 'alpha': 1} legend_kwargs = {'bbox_to_anchor': (0.9, 0.9), 'frameon': False} ax = gplt.polyplot(continental_usa, projection=gcrs.AlbersEqualArea(central_longitude=-98, central_latitude=39.5), **poly_kwargs) gplt.pointplot(continental_cities, projection=gcrs.AlbersEqualArea(), ax=ax, scale='POP_2010', limits=(1, 80), hue='POP_2010', cmap='Blues', legend=True, legend_var='scale', legend_values=[8000000, 6000000, 4000000, 2000000, 100000], legend_labels=['8 million', '6 million', '4 million', '2 million', '100 thousand'], legend_kwargs=legend_kwargs, **point_kwargs) plt.title("Large cities in the contiguous United States, 2010") plt.savefig("/results/map.svg", bbox_inches='tight', pad_inches=0.1)