Simon Danisch / Oct 09 2019

Julia for Pythonistas

About me

  • Julia expert at Nextjournal
  • worked on Julia's Graphics, GPU and Plotting support for the last ~6 years

Outline

  • What is Julia and when to use
  • Calling other languages
  • Performance
  • Writing Libraries
  • Noteworthy examples
  • Calling Julia from Python
  • Caveats

What is Julia

  • Python at the speed of C, with elegant syntax for math like Matlab
  • Multi purpose
  • Multi paradigm (functional, object inheritance)
  • Appeared in 2012

Noteworthy Features

  • Best of lisp (macros, code as datastructure, functional programming)
  • dispatches functions on multiple arguments
  • optional type system
  • introspection & reflections

When to use Julia

When performance is key

Why is Performance important?!

  • Python performance is good enough for me!
  • Imagine learning a new language, at the same time your project runs into performance issues

I'll just call out to C

Rely on mature external libraries for performance-critical code

  • Someone has to write those libraries
  • some problems are impossible or awkward to express as calls to C-libraries

I'll miss all the great Libraries

Julia talks to all major Languages - mostly without overhead!

PyCall

Passing functions:

using PyCall

so = pyimport("scipy.optimize")

function mycallback(x)
  cos(x) - x
end

so.newton(mycallback, 1)
# or with lambda
so.newton(x-> cos(x) - x, 1)
0.739085

Inherit from Python classes:

P = pyimport("numpy.polynomial")
@pydef mutable struct Doubler <: P.Polynomial
    function __init__(self, x=10)
        self.x = x
    end
    my_method(self, arg1::Number) = arg1 + 20
    x2.get(self) = self.x * 2
    function x2.set!(self, new_val)
        self.x = new_val / 2
    end
end
Doubler().x2
20

Write whole functions

py"""
def pymap(f, A, B):
  for i in range(len(A)):
    A[i] = f(B[i])
"""
A = rand(Float32, 10)
pymap(f, a, b) = (py"pymap"(f, a, b); a)
result = pymap(sin, zeros(Float32, 10), A)
result  sin.(A)
true

Calling C

void c_map(void *fptr, float *A, float *B, int len)
{
	float (*f)(float) = (float (*)(float))fptr;
	for (int i = 0; i< len; ++i)
			A[i] = (*f)(B[i]);
}
cmap.c
C
# Great shell integration:
run(`gcc -shared -fPIC -O3 /cmap.c -o /cmap.so`)

function cmap(f, A, B)
  # Get a C function pointer from a Julia function
  fptr = @cfunction($f, Float32, (Float32,))
  ccall(
    (:c_map, "/cmap.so"), # function name, library
    Nothing, # return type
    (Ptr{Nothing}, Ptr{Float32}, Ptr{Float32}, Cint), # argument types
    fptr, A, B, length(A) # arguments
  )
  return A
end
result = cmap(sin, zeros(Float32, 10), A)
result  sin.(A)
true

C++

Cxx can compile arbitrary C++ code and link it with Julia and other C++ libraries

using Cxx
cxx"""
void cxxmap(void *fptr, float *A, float *B, int len)
{
  float (*f)(float) = (float (*)(float))fptr;
  for (auto x = 0; x < len; ++x)
  	A[x] = (*f)(B[x]);
}
"""

function cxxmap(f, A, B)
  fptr = @cfunction($f, Float32, (Float32,)).ptr
  @cxx cxxmap(fptr, pointer(A), pointer(B), Cint(length(A)))
  return A
end
result = cxxmap(sin, zeros(Float32, 10), A)
result  sin.(A)
true

Let's compare the performance:

using BenchmarkTools
a, b = zeros(Float32, 10^7), rand(Float32, 10^7)
c_b = @belapsed cmap($sin, $a, $b)
cxx_b = @belapsed cxxmap($sin, $a, $b)
python_b = @elapsed pymap(sin, a, b);
julia_b = @belapsed map!($sin, $a, $b);
timings = [python_b, c_b, cxx_b, julia_b]
4-element Array{Float64,1}: 55.6804 0.138172 0.152887 0.091246
timings[1] ./ timings
4-element Array{Float64,1}: 1.0 402.978 364.194 610.223

Now, can we go faster?

using BenchmarkTools
function map_threaded!(f, a, b)
  Threads.@threads for i in eachindex(a, b)
    @inbounds a[i] = f(b[i])
  end
  return a
end
map_threaded! (generic function with 1 method)
threaded_b = @belapsed map_threaded!($sin, $a, $b)
0.040066
using CuArrays, CUDAnative
acu, bcu = cu(a), cu(b) # transfer to GPU memory
# calls to cuda are async
cuda_b = @belapsed CuArrays.@sync(map!($(CUDAnative.sin), $acu, $bcu))
0.000748434
push!(timings, threaded_b, cuda_b);
timings[2] ./ timings
6-element Array{Float64,1}: 0.00248153 1.0 0.903757 1.51429 3.44862 184.615

Take away

  • State of the art performance
  • simple to get there
  • great interop (JavaCall, RCall, Matlab)
  • Still interactive & fun

Building Libraries in Julia

New motto: come for the performance, stay for the syntax!

  • it's nice to have an optional type system & multiple dispatch
  • Julia libraries are very easy to extend
0.2s
Julia 1.2 (Julia)
module ColorBase

abstract type AbstractColor end

struct RGB{T} <: AbstractColor
  r::T
  g::T
  b::T
end

Base.Tuple(rgb::RGB) = (rgb.r, rgb.g, rgb.b)

function Base.:(+)(a::T, b::T) where T <: AbstractColor
  tuple = (Tuple(a) .+ Tuple(b))
  return T(tuple...)
end

export AbstractColor

end

module ColorExtensions

using ..ColorBase # Syntax for using module in same namespace

struct XYZSpace{T} <: AbstractColor
  x::T
  y::T
  z::T
end

Base.Tuple(hsl::XYZSpace) = (hsl.x, hsl.y, hsl.z)
export XYZSpace

end;
using .ColorExtensions, .ColorBase

xyz = XYZSpace(0.1, 0.33, 0.5) + XYZSpace(0.1, 0.2, 0.3)
XYZSpace{Float64}(0.2, 0.53, 0.8)

Same performance as built-in types

colors = [XYZSpace(rand(3)...) for i in 1:10^7]
@btime sum($colors);
builtin = rand(3 * 10^7)
@btime sum($builtin);

Catchy Examples

Data Science

emissions.csv
using CSV, DataFrames, Plots, StatsPlots
theme(:wong)
data = CSV.read(
emissions.csv
) data.co2 = data.co2 ./ 10^9; co2max = by(data, :country, :co2 => maximum) sort!(co2max, :co2_maximum, rev = true) bad_countries = co2max.country[1:10] the_worst = filter(data) do row row.country in bad_countries end @df the_worst plot( :Year, :co2, group = :country, legend = :topleft, linewidth = 4, title = "Total Accumulative Emission" )

LINQ like query with Query.jl

using FileIO, Query, VegaLite, Dates
data |>
  @mutate(co2 = _.co2, year = Date(_.Year)) |>
  @groupby({_.country}) |>
  @orderby_descending(maximum(_.co2)) |>
  @take(5) |>
  @mapmany(_, __) |>  # This is essentially ungroup
  DataFrame |>
  @vlplot(
    :line, x=:year, y=:co2, color=:country, title="Emissions",
    width = 500, height = 300
  )

Flux

  • Features comparable to Tensorflow
  • Mostly as fast
  • Julia GPU acceleration
  • Written 100% in Julia, just a ~2000 LOCs
  • Works with plain Julia Arrays, types and functions

From the Flux model zoo:

using Flux
using Flux: crossentropy, normalise, onecold, onehotbatch, params
using Statistics: mean

labels = Flux.Data.Iris.labels()
features = Flux.Data.Iris.features()

normed_features = normalise(features, dims=2)
klasses = sort(unique(labels))
onehot_labels = onehotbatch(labels, klasses)

train_indices = [1:3:150 ; 2:3:150]
X_train = normed_features[:, train_indices]
y_train = onehot_labels[:, train_indices]
X_test = normed_features[:, 3:3:150]
y_test = onehot_labels[:, 3:3:150]

model = Chain(
    Dense(4, 3), # A normal Julia type, predefined in Flux
    softmax # A normal julia function
)

loss(x, y) = crossentropy(model(x), y)
accuracy(x, y) = mean(onecold(model(x)) .== onecold(y))

data_iterator = Iterators.repeated((X_train, y_train), 110)
Flux.train!(loss, params(model), data_iterator, Descent(0.5))
accuracy_score = accuracy(X_test, y_test)
accuracy_score > 0.8
true

Cassette - Hook into the Compiler!

using Cassette
Cassette.@context Ctx
# this prehook implements simple trace logging for overdubbed functions
Cassette.prehook(::Ctx, f, args...) = println(f, args)
Cassette.overdub(Ctx(), *, 1, 2.0)
2.0
Cassette.overdub(::Ctx, ::typeof(convert), ::Type{Float64}, x::Float64) = x * 3.0
Cassette.overdub(Ctx(), *, 1, 2.0)
6.0

Interactive Graphics & Mimis

3.2s
Julia 1.2 (Julia)
using Mimi, MimiDICE2013
using AbstractPlotting, WGLMakie, Markdown
using Observables, Hyperscript, JSServe
using JSServe: with_session, Slider
using JSServe.DOM
using AbstractPlotting: scatter, scatter!
set_theme!(markersize = 5,font = "Dejavu Sans", resolution = (500, 400))

baseline = MimiDICE2013.get_model()
run(baseline)
interacted = MimiDICE2013.get_model()

with_session() do session
  time_tatm = getdataframe(baseline, :climatedynamics, :TATM)
  time = time_tatm.time
  control_rate_s = JSServe.Slider(LinRange(0, 1, 100))
  control_rate = map(control_rate_s) do r
    round(r, digits = 3)
  end
  year = JSServe.Slider(1:length(time))
  data = map(control_rate) do control
    set_param!(interacted, :emissions, :MIU, fill(control, 60))
		run(interacted)
  	return something.(interacted[:climatedynamics, :TATM], 0.0)
  end
  scene = scatter(time, something.(time_tatm.TATM, 0.0))
  scatter!(time, data, color = :red)
  scene[Axis].names.axisnames = ("Year", "Temperature Increase")
  scene[Axis].names.textsize = (8, 8)
  dmg_estimated = map(year, control_rate) do year_idx, _
    round(interacted[:damages, :DAMAGES][year_idx], digits = 4)
  end
	selected_year = map(i-> time[i], year)
  b = DOM.font("● baseline", color = :black)
  ec = DOM.font("● with emission control", color = :red)
  return md"""
    # Explore Climate

    Set amount of emission control: $(control_rate_s) $(control_rate)
    # Temperature Increase

    $b | $ec

    $(scene)

    # Estimated damage in year $(selected_year)

    $(year)

    $(dmg_estimated) trillion USD per year.
  """
end

Cool, I'm intrigued, but can't leave Python

  • You can also call Julia from Python
  • could be used instead of C/Cython
111.0s
# Internals of the Python package:
import julia # julia package
from julia.api import Julia
jl = Julia(compiled_modules=False)
from julia import Base
from julia import Main
from julia import AbstractPlotting
from julia import WGLMakie
from julia import JSServe

class Plot:
  def __init__(self):
    # Setup Julia
    Main.eval("""
        url = strip(ENV["NEXTJOURNAL_RUNTIME_SERVICE_URL"])
        ENV["JULIA_WEBIO_BASEURL"] = url
        ENV["WEBIO_SERVER_HOST_URL"] = "0.0.0.0"
        ENV["WEBIO_HTTP_PORT"] = 9998
        surl = replace(url, "http" => "ws") * "/webio_websocket/"
        ENV["WEBIO_WEBSOCKT_URL"] = surl
        JSServe.__init__()
		""")
    self.func = None
    self.plot = None
    
  def __getattr__(self, symbol):
    # Return Julia function
    if Main.eval(f"isdefined(AbstractPlotting, :{symbol})"):
      self.func = Main.eval(f"getfield(AbstractPlotting, :{symbol})")
      return self
    else:
      return None
    
  def __call__(self, *args, **kwargs):
    if (self.func != None):
      self.plot = self.func(*args, **kwargs)
    return self
  def _repr_webio_(self):
    if (self.plot != None):
      return Base.repr(
        "application/vnd.webio.application+html", 
        self.plot
      ).decode("utf-8")
    else: 
      return ""
M = Plot()
import numpy as np
X = np.arange(-2, 2, 0.25)
Y = np.arange(-2, 2, 0.25)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)

M.surface(X, Y, Z, resolution = (500, 400), colormap = "magma")

What's bad

Compile Times

  • can be compiled ahead of time though
  • With Atom / Revise, one can compile diffs

Deploying Julia

It's a work in progress...

  • Prototypes already exist and people are deploying Julia
  • Filesize big, awkward to build

So when should I use Julia?

  • Now! If you need performance and plan to write your own libraries
  • In ~1-2 Years if you want a smooth deploy
  • In ~3-5 Years if you want a 100% smooth experience

Further reading