Python Environments
This notebook describes and creates the default Python 2 & 3 environments in Nextjournal. Check out the showcase if you want to see what the environment contains. To see how it’s built, see setup.
Showcase
The Python 3 environment runs Python
pip freeze
System Packages and Basics
A wide variety of support libraries are installed, as well as gcc v7.
Python packages are installed using conda,
or pip
version setuptools
version
Plotting
The default environment comes with plotly
version matplotlib
version
Plotly
Plot a histogram using Plotly, a plotting library for making interactive graphs online.
import plotly.graph_objs as go
import numpy as np
x0 = np.random.randn(500)
x1 = np.random.randn(500)+1
trace1 = go.Histogram(x=x0, opacity=0.75)
trace2 = go.Histogram(x=x1, opacity=0.75)
layout = go.Layout(barmode='overlay')
go.Figure(data=[trace1, trace2], layout=layout)
Matplotlib
Plot a 5 hertz sine wave using matplotlib
, a Python plotting library.
import matplotlib.pyplot as plt, numpy as np
# Data for plotting
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin((5 * 2) * np.pi * t)
# Note that using plt.subplots below is equivalent to using
# fig = plt.figure() and then ax = fig.add_subplot(111)
_, ax = plt.subplots()
ax.plot(t, s)
ax.set(xlabel='time (s)', ylabel='voltage (mV)', title='Sine Wave')
ax.grid()
plt.show()
Data Structures
Nextjournal's default Python environment contains several packages for data manipulation and parsing.
The SciPy ecosystem is available, including
scipy
version,numpy
, andpandas
.simplejson
makes it easy to encode/decode JSON data structures.six
is included to help smooth differences between Python 2 and 3.
Numpy
Numpy
's main object is a N-dimensional array useful for linear algebra, Fourier transforms, and random number capabilities. Here it is used to create a Mandelbrot set which is ultimately plotted using matplotlib
.
import numpy as np, matplotlib.pyplot as plt
def mandelbrot( h,w, maxit=10):
y,x = np.ogrid[ -1.4:1.4:h*1j, -2:0.8:w*1j ]
c = x+y*1j
z = c
divtime = maxit + np.zeros(z.shape, dtype=int)
for i in range(maxit):
z = z**2 + c
diverge = z * np.conj(z) > 2**2 # who is diverging
div_now = diverge & (divtime==maxit) # who is diverging now
divtime[div_now] = i + 100 # note when
z[diverge] = 2 # avoid diverging too much
return divtime
plt.subplots(1,figsize=(20,20))
plt.imshow(mandelbrot(1000,1000))
plt.axis('off')
plt.show()
Pandas
Pandas
makes data analysis easier in Python. For example, a single instantiation of pandas
' Series
class can include all label and data information. 1000 random values are generated by numpy
and the final graph is plotted with matplotlib
.
import pandas as pd, matplotlib.pyplot as plt, numpy as np
ts = pd.Series(np.random.randn(1000),
index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()
_, ax = plt.subplots()
ax = ts.plot()
plt.show()
Simplejson
Import and export JSON on Nextjournal using simplejson
. In the example below, a Python data structure input results in JSON output—the change from None
to null
is a clear indicator.
import simplejson as json
json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])
Six
Six
makes it easy to write Python code that is compatible with both Python 2 and Python 3.
For example, Python 2's urllib
, urllib2
, and urlparse
modules have been combined in the urllib
package in Python 3. The six.moves.urllib
package is a version-independent location for this functionality.
Python 2:
from __future__ import print_function
from six.moves.urllib.request import urlopen
url = urlopen("http://nextjournal.com")
print(url.read())
Python 3:
from __future__ import print_function
from six.moves.urllib.request import urlopen
url = urlopen("http://nextjournal.com")
print(url.read())
Data Storage
Apache Arrow
import numpy as np
import pandas as pd
import pyarrow as pa
# Converting Pandas Dataframe to Apache Arrow Table
df = pd.DataFrame({"one": [20, np.nan, 2.5],
"two": ["january", "february", "march"],
"three": [True, False, True]},index=list("abc"))
table = pa.Table.from_pandas(df)
# Writing a parquet file from Apache Arrow
import pyarrow.parquet as pq
pq.write_table(table, "/shared/example.parquet")
# Reading a parquet file
table2 = pq.read_table("/shared/example.parquet")
# Reading a parquet file
df_new = table2.to_pandas()
df_new == df
Setup
Build a Minimal Python 3 Environment
Download and install conda
.
CONDA_VER="4.8.3"
PYTHON_VER="py37"
file="Miniconda3-${PYTHON_VER}_${CONDA_VER}-Linux-x86_64.sh"
wget -q --show-progress --progress=bar:force -P /results \
https://repo.continuum.io/miniconda/${file}
bash Miniconda3-py37_4.8.3-Linux-x86_64.sh -b -p /opt/conda
Links to make sure conda Python supersedes system Python for non-absolute, non-versioned calls.
ln -s /opt/conda/bin/pip /opt/conda/bin/pip3
ln -s /opt/conda/bin/pip /opt/conda/bin/pip3.7
ln -s /opt/conda/bin/python3.7 /opt/conda/bin/python3m
ln -s /opt/conda/bin/python3.7m-config /opt/conda/bin/python3m-config
Add conda
's library directory so ldconfig
will pick it up, set conda config
, and ensure pip
is reasonably updated. We also pin Python to the installed minor version, allowing only patch-version up/downgrades.
# make this the last alphabetically => lowest precedence libraries
echo "/opt/conda/lib" >> /etc/ld.so.conf.d/zz-conda.conf
mkdir ~/.conda/pkgs # prevent a warning
conda config --set always_yes True
pip_ver=$(pip --version | sed 's/pip \(.*\) from.*/\1/')
echo "pip >=$pip_ver" > /opt/conda/conda-meta/pinned # prevent pip downgrade
# upgrade Python within minor version
python_minor=$(python --version | sed 's/Python \(.*\)\..*/\1/')
echo "python =$python_minor" >> /opt/conda/conda-meta/pinned
conda update python pip
conda update -yn base conda
conda clean -qtipy
ldconfig
python -V
pip -V
Package up the installation for use in other environments.
du -hsx /
tar -zcPf /results/minimal-python3.tgz /opt/conda
Build the Default Python 3 Environment
Install
Just need a few system libraries, particularly for HDF5 support.
apt-get -qq update
DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends \
libxext6 libhdf5-100
apt-get clean
rm -r /var/lib/apt/lists/*
This default image has support for a number of general-use packages, including pandas
, scipy
, scikit-learn
, scikit-image
, and opencv-python
. For graphical output, matplotlib
and plotly
are installed. We'll also install some basic utilities, as well as setuptools
to make any additional installs less difficult. We're installing Jedi to have code completions for Python, and Jupyter to support notebook imports.
conda install -c plotly \
setuptools six simplejson dill pillow pytables h5py \
plotly matplotlib tqdm termcolor tabulate \
python-dateutil more-itertools toolz cython cffi attrs decorator jedi \
numpy scipy patsy statsmodels pandas pandas-datareader seaborn \
scikit-learn scikit-image \
jupyter
conda clean -qtipy
ldconfig
# make sure jupyter components are up-to-date
# also add non-anaconda-main packages here (conda-forge packages can be broken)
pip install --upgrade altair pandas pyarrow feather-format \
pipenv jupyter-client jupyter-core
python -V
pip -V
jupyter --version
jupyter kernelspec list
jupyter --paths
And we'll install the unofficial wheel of OpenCV.
pip install opencv-python-headless
Pre-import packages to speed up cold boot time.
PI_PKGS="altair, backcall, bleach, certifi, cffi, chardet, cloudpickle, conda, conda_package_handling, cryptography, cycler, cython, cytoolz, dask, decorator, defusedxml, dill, entrypoints, feather, h5py, idna, imageio, importlib_metadata, ipykernel, ipython_genutils, ipywidgets, jedi, jinja2, joblib, jsonschema, jupyter, jupyter_client, jupyter_console, jupyter_core, kiwisolver, lxml, markupsafe, matplotlib, mistune, mkl_fft, mkl_random, mock, more_itertools, nbconvert, nbformat, networkx, notebook, numexpr, numpy, olefile, cv2, pandas, pandas_datareader, pandocfilters, parso, patsy, pexpect, pickleshare, PIL, pipenv, plotly, prometheus_client, prompt_toolkit, ptyprocess, pyarrow, pycosat, pycparser, pygments, OpenSSL, pyparsing, pyrsistent, socks, dateutil, pytz, pywt, zmq, qtconsole, requests, retrying, ruamel_yaml, skimage, sklearn, scipy, seaborn, send2trash, simplejson, six, statsmodels, tables, tabulate, termcolor, terminado, testpath, toolz, tornado, tqdm, traitlets, urllib3, virtualenv, wcwidth, webencodings, widgetsnbextension, zipp, ipykernel.pylab.backend_inline"
python -c "import $PI_PKGS"
pkgs=$(echo $PI_PKGS | sed 's/,//g')
for pkg in $pkgs; do
python -c "from $pkg import *"
done
Finally, set up default fonts for matplotlib
.
mkdir -p ~/.config/matplotlib/
echo 'font.family: sans-serif
font.sans-serif: Fira Sans, PT Sans, Open Sans, Roboto, DejaVu Sans, Liberation Sans, sans-serif
font.serif: PT Serif, Noto Serif, DejaVu Serif, Liberation Serif, serif
font.monospace: Fira Mono, Roboto Mono, DejaVu Sans Mono, Liberation Mono, Fixed, Terminal, monospace' > ~/.config/matplotlib/matplotlibrc
Check size and final tests.
python -V
pip -V
conda -V
du -hsx /
Incremental Additions
pip install vega_datasets
Test
python --version
jupyter kernelspec list
jupyter --version
jupyter --paths
import platform; platform.python_version()
import pip;
pip.__version__
import plotly; plotly.__version__
import numpy as np; np.__version__
import matplotlib; matplotlib.__version__
import setuptools; setuptools.__version__
import six; six.__version__
import simplejson; simplejson.__version__
import pandas; pandas.__version__
import scipy; scipy.__version__
Minimal Python 2
Download and install conda
.
CONDA_VER="4.8.3"
PYTHON_VER="py27"
file="Miniconda2-${PYTHON_VER}_${CONDA_VER}-Linux-x86_64.sh"
wget -q --show-progress --progress=bar:force -P /results \
https://repo.continuum.io/miniconda/${file}
bash Miniconda2-py27_4.8.3-Linux-x86_64.sh -b -p /opt/conda
Setup conda
, ld
, and pip
.
# make this the last alphabetically => lowest precedence libraries
echo "/opt/conda/lib" >> /etc/ld.so.conf.d/zz-conda.conf
mkdir ~/.conda/pkgs # prevent a warning
conda config --set always_yes True
pip_ver=$(pip --version | sed 's/pip \(.*\) from.*/\1/')
echo "pip >=$pip_ver" > /opt/conda/conda-meta/pinned # prevent pip downgrade
echo "python =2.7" >> /opt/conda/conda-meta/pinned # Stick to Python 2.7
conda update python pip
conda update -yn base conda
conda clean -qtipy
ldconfig
python -V
pip -V
du -hsx /
Default Python 2
Install
apt-get -qq update
DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends \
libxext6 libhdf5-100
apt-get clean
rm -r /var/lib/apt/lists/*
conda install -c plotly \
setuptools six simplejson dill pillow pytables h5py \
plotly matplotlib tqdm termcolor tabulate \
python-dateutil more-itertools toolz cython cffi attrs decorator jedi \
numpy scipy patsy statsmodels pandas pandas-datareader seaborn \
scikit-learn scikit-image \
jupyter
conda clean -qtipy
ldconfig
# make sure jupyter components are up-to-date
# also add non-anaconda-main packages here (conda-forge packages can be broken)
pip install --upgrade altair pandas pyarrow feather-format \
pipenv jupyter-client jupyter-core
pip install opencv-python-headless
mkdir -p ~/.config/matplotlib/
echo 'font.family: sans-serif
font.sans-serif: Fira Sans, PT Sans, Open Sans, Roboto, DejaVu Sans, Liberation Sans, sans-serif
font.serif: PT Serif, Noto Serif, DejaVu Serif, Liberation Serif, serif
font.monospace: Fira Mono, Roboto Mono, DejaVu Sans Mono, Liberation Mono, Fixed, Terminal, monospace' > ~/.config/matplotlib/matplotlibrc
python -V
pip -V
jupyter --version
jupyter kernelspec list
jupyter --paths
du -hsx /
PI_PKGS="altair, bleach, certifi, cffi, chardet, cloudpickle, conda, conda_package_handling, cryptography, cycler, cython, cytoolz, dask, decorator, defusedxml, dill, entrypoints, feather, h5py, idna, imageio, importlib_metadata, ipykernel, ipython_genutils, ipywidgets, jedi, jinja2, jsonschema, jupyter, jupyter_client, jupyter_console, jupyter_core, kiwisolver, lxml, markupsafe, matplotlib, mistune, mkl_fft, mkl_random, mock, more_itertools, nbconvert, nbformat, networkx, notebook, numexpr, numpy, olefile, cv2, pandas, pandas_datareader, pandocfilters, parso, patsy, pexpect, pickleshare, PIL, pipenv, plotly, prometheus_client, prompt_toolkit, ptyprocess, pyarrow, pycosat, pycparser, pygments, OpenSSL, pyparsing, pyrsistent, socks, dateutil, pytz, pywt, zmq, qtconsole, requests, retrying, ruamel_yaml, skimage, sklearn, scipy, seaborn, send2trash, simplejson, six, statsmodels, tables, tabulate, termcolor, terminado, testpath, toolz, tornado, tqdm, traitlets, urllib3, virtualenv, wcwidth, webencodings, widgetsnbextension, zipp, ipykernel.pylab.backend_inline"
python -c "import $PI_PKGS"
Test
python --version
jupyter kernelspec list
jupyter --version
jupyter --paths
import platform; platform.python_version()