Plotting Data

The environments underlying Nextjournal’s default language runtimes come with a variety of plotting libraries already pre-installed. This guide is a simple collection of plotting examples using different libraries for all programming languages that Nextjournal offers at the moment.

Most plots will use the following data sets:

artist_data.csv
Download
artwork_data.csv
Download

R

For plotting in R, Nextjournal supports the default R graphics package (graphics), plot_ly(), ggplot2(), and ggplotly() with no additional installation required.

The Default R Graphics Package

This first example uses the standard smoothScatter() function to plot the birth year of artists represented in the Tate Museum's permanent collection. Note that smoothScatter() does not require the loading of any dependencies.

artists <- read.csv(artist_data.csv, header=T)
born <- artists$yearOfBirth
smoothScatter(born, 1:length(born),
              axes=FALSE,
							xlab="Year", ylab="",
							main="Distribution of Artist's Birth Years at the Tate")
axis(1, col.ticks="blue")
1.4s
artists
0.2s

Working With Dependencies

Plotly and ggplot2 are external dependencies that offer more features than the default R graphics.

Load the tidyverse collection of R packages, which includes two dependencies used in the upcoming sections, ggplot2 (ggplot()) and readr (read_csv()). The Plotly package provides two important plotting functions, plot_ly() and ggplotly().

library(tidyverse)
library(plotly)
1.8s

ggplot2

ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics.

artists <- read_csv(artist_data.csv)
born <- artists$yearOfBirth
df <- data.frame(born)
ggplot(df, mapping=aes(x = born, y = as.numeric(row.names(df)))) + 
           geom_point(size=2.2, alpha=0.4, shape=15) + 
           labs(x = "Year", y=element_blank(),
                title = "Distribution of Artist's Birth Years at the Tate", 
                subtitle = "From the Museum's Permanent Collection") + 
           theme_bw() +
           theme(axis.text.y = element_blank(),
                 axis.ticks.y = element_blank(),
                 panel.grid.minor=element_blank(),
                 panel.grid.major.y=element_blank())
1.1s

Plotly

The plot_ly() function transforms data into a Plotly object to enable interactive graphics and advanced plotting features.

This histogram compares the acquisition of male artists of a certain age versus female artists. The interactive features of Plotly in R is useful because the datasets overlap. Turning off the male histogram gives a better sense of the growth of female acquisition; turning on the male histogram shows how far institutions have yet to go.

artists <- read_csv(artist_data.csv)
female_artists <- artists[artists$gender == "Female",]
male_artists <- artists[artists$gender == "Male",]
plot_ly(alpha=0.6) %>%
  add_histogram(data=female_artists, x=~yearOfBirth, name="Females") %>%
  add_histogram(data=male_artists, x=~yearOfBirth, name="Males") %>%
  layout(barmode="overlay", xaxis=list(title="Year of Birth"))
0.4s

ggplotly

The ggplotly() function transforms a static ggplot object into a Plotly object. More detailed information is available at the Plotly ggplot2 Library documentation.

artists <- read_csv(artist_data.csv)
born <- artists$yearOfBirth
df <- data.frame(born)
id <- as.numeric(row.names(df))
ggplotly(ggplot(df, mapping=aes(x = born, y = id)) + 
           geom_point(size=1.5, alpha=0.4, shape=15) +
           labs(x = "Year", y="",
                title = "Distribution of Artist's Birth Years at the Tate") +
           theme_bw() +
           theme(axis.text.y = element_blank(),
                 axis.ticks.y = element_blank(),
                 panel.grid.minor=element_blank(),
                 panel.grid.major.y=element_blank()))
0.5s

Multiple Plots

A Nextjournal cell can show multiple graphs—the runner will detect each new figure automatically and display them in order.

artworks <- read_csv(artwork_data.csv)
drop <- c("accession_number", "artistRole", "artistId", "dateText", "creditLine", "units", "inscription", "thumbnailCopyright", "thumbnailUrl", "url")
artworks_rem <- artworks[ , !(names(artworks) %in% drop)]
artworks_size <- artworks_rem[!(is.na(artworks_rem$height & artworks_rem$width & artworks_rem$year)),]
artworks_size$size <- artworks_size$height * artworks_size$width
metal <- artworks_size[artworks_size$medium == "Steel" | artworks_size$medium=="Bronze",]
plot_ly(data=metal, x=~acquisitionYear, name="Sculptural Acquisitions")
plot_ly(data=metal, x=~year, y=~acquisitionYear, z=~size, color=~medium, 
        colors = c('#BF382A', '#0C4B8E'), text=~artist,
        marker=list(size=4, opacity=0.5)) %>%
  add_markers() %>%
  layout(scene = list(xaxis = list(title = 'Year Created'),
                     yaxis = list(title = 'Year of Acquisition'),
                     zaxis = list(title = 'Size')),
         annotations = list(
           x = 1.13,
           y = 1.05,
           text = 'Material',
           xref = 'paper',
           yref = 'paper',
           showarrow = FALSE
         ))
2.0s

Python

Matplotlib

Matplotlib is a plotting library for Python with at MATLAB-like interface:

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
artwork_data = pd.read_csv(artwork_data.csv)
artwork_data.drop(columns=["accession_number", "artistRole", "artistId", "dateText", "acquisitionYear", "dimensions", "width", "height", "depth", "creditLine", "units", "inscription", "thumbnailCopyright", "thumbnailUrl", "url"], inplace=True)
# Drop the rows listed as NaN, otherwise indexing oil, acrylic, and watercolour artworks yeild the error "ValueError: cannot index with vector containing NA / NaN values." Replace this line with something more sensible to get a more complete dataset.
artwork_data.dropna(subset=['medium'],inplace=True)
artwork_data["year"] = pd.to_numeric(artwork_data["year"], errors="coerce")
oil=artwork_data[artwork_data["medium"].str.contains("oil", case=False)]
acrylic=artwork_data[artwork_data["medium"].str.contains("acrylic", case=False)]
watercolour=artwork_data[artwork_data["medium"].str.contains("watercolour", case=False)]
fig, ax = plt.subplots()
ax.set(xlabel='year', ylabel='number of works',
       title='Paintings at the Tate, by Medium')
ax.hist([oil["year"], acrylic["year"], watercolour["year"]], stacked=True);
1.2s

Plotly

Plotly's Python graphing library creates interactive, publication-quality graphs online.

import plotly.express as px
px.line(x=["a","b","c"], y=[1,3,2], title="sample figure")
0.6s

Information can also be displayed as a table using Plotly’s Figure Factory module.

import pandas as pd
# plotly imports
import plotly.figure_factory as ff
# plotly.graph_objs contains all the helper classes to make/style plots
import plotly.graph_objs as go
artist_data = pd.read_csv(artist_data.csv)
# Display the first 12 rows and 3 columns of the dataframe
ff.create_table(artist_data.iloc[:12,:3], index=False)
1.9s

Plot two histograms that compare the number of male artists in the Tate collection as compared to the number of female artists, distributed by their year of birth.

import numpy as np
artist_data = pd.read_csv(artist_data.csv)
male = artist_data['gender'] == 'Male'
female = artist_data['gender'] == 'Female'
trace1 = go.Histogram(
    x=np.array((artist_data[female]['yearOfBirth'])),
    name='Female')
trace2 = go.Histogram(
    x=np.array((artist_data[male]['yearOfBirth'])),
    name='Male')
trace_data = [trace1, trace2]
layout = go.Layout(
    bargroupgap=0.3)
go.Figure(data=trace_data, layout=layout)
0.9s

Note that the data points can be hovered over to view the data for each, both here and in the published view. Traces can also be toggled on and off by clicking in the legend.

For more examples and details about this library, please refer to the official Plotly Python Open Source Graphing Library documentation.

Julia

Plots offers the most flexible way to visualize data using Julia in Nextjournal. This preinstalled library provides a unified interface to different plotting libraries, including plotly and gr. Plotly graphs are interactive, while gr is faster for large data sets.

While documentation exists for both the Plotly Julia Library and Julia Package GR, these examples leverage plots, as such the Plots documentation will offer the most useful supplementary information.

Plotly

using Plots; plotly()
0.5s
scatter(rand(10), rand(10), title="Plot.ly Backend")
1.0s

The Plotly Julia Library offers more documentation examples for reference.

gr

using Plots; gr()
0.4s

gr produces a png file which is displayed by Nextjournal.

scatter(rand(10), rand(10), title="GR Backend")
1.0s
Runtimes (3)