Julia: A Compiler for the Future

About Me

studied Cognitive Science
Computer Vision & ML
Worked for the Julia Lab
Now at Nextjournal

Motivation

Nextjournal: interactive work flows, data science
Me: interactive work flows for low-level programing
Extensible ML in a sane language
The Julia Language!

Python's adoption is sky-rocking, and it's undeniably successful!
we should really try to understand Python

What is Python

interpreted, dynamic, multi paradigm language
convenient for scripts & interactive programing
pure Python is really slow (50x-300x slower than C)
simple type system, basic meta programming
people don't care about performance (or do they?)

Data Science & ML driver of growth

TensorFlow, PyTorch, Keras, Theano

data science usually needs lots of performance...
... and Python is the most popular language for it!?!

C/C++ to the rescue

Konrad Hinsen has an explanation for this:
Python is amazing at gluing (C/C++) libraries together in scripts
Numpy / Pandas are actually written in C++
50% of Python's top 10 packages use C/C++

The best of 2 worlds

number crunching happens in fast, compiled libraries
scripting happens in fun, dynamic language without any compilation
Perfect platform to glue libraries together

Is it, though?

C++ Packages usually don't work with Python classes
Python callbacks are slow (e.g. when used in a solver)
Inter Procedural Optimization (IPO) inhibited
constant need to rewrite packages in C++/Cython
Two language problem: hard to maintain, hard to contribute

Summary

people trade in interactivity & development speed for slow execution & no typing
Core libraries not actually written in Python

How would a language need to look like to write an ML library?

High Level view of a DNN (or most ML):

Inside the tunable Function:

Tuning a.k.a Back-Propagation:

utils.jl

include(utils.jl
)

function next_position(position, angle)
    position .+ (sin(angle), cos(angle))
end

# Our tunable function ... or chain of flexible links
function predict(chain, input)
    output = next_position(input,  chain[1]) # Layer 1
    output = next_position(output, chain[2]) # Layer 2
    output = next_position(output, chain[3]) # Layer 3
    output = next_position(output, chain[4]) # Layer 4
    return output
end

function loss(chain, input, target)
    sum((predict(chain, input) .- target) .^ 2)
end

chain = [(rand() * pi) for i in 1:4]

input, target = (0.0, 0.0), (3.0, 3.0)
weights, s = visualize(chain, input, target)
s

26.1s

Julia

using Zygote
function loss_gradient(chain, input, target)
  # first index, to get gradient of first argument
  Zygote.gradient(loss, chain, input, target)[1]
end
for i in 1:100
  # get gradient of loss function
  angle∇ = loss_gradient(chain, input, target)
  # update weights with our loss gradients
  # this updates the weights in the direction of smaller loss
  chain .-= 0.01 .* angle∇
  # update visualization
  weights[] = chain
  sleep(0.01)
end;

4.5s

Julia

Summary

DNN is build from many small parametrized functions
Loss function calls all those functions
Functions could be user defined or primitives from the ML framework
Frameworks need to differentiate loss function and execute it fast

AD needs to transform code

example_function(x) = sin(x) + cos(x)
# needs to be rewritten to:
derivative(::typeof(example_function), x) = cos(x) - sin(x)

0.8s

Julia

derivative (generic function with 1 method)

huge call graph that needs to be transformed
result of transformation needs to execute with flawless performance
you want to do: fusing of calculations, inlining, removing temporaries, execute on the gpu
deep in the territory of compiler & language research

Lots of Frameworks recognize this

New Intermediate Representations (IR) keep popping up:
PyTorch IR (facebook), [XLA, MLIR, Jax] (google)
Translates Python API, or Python functions to IR
Then with IR do: IPO, automatic differentiation, fusing, compiling native code
Most use LLVM to generate native CPU/GPU code

What if you're not Google or Facebook?

This sounds like a job for Julia

multi paradigm, dynamic language
compiled at runtime -> as fast as C
ease of use and elegance on par with Python
syntax optimized for writing math
has LLVM JIT & Compiler Plugins
Coming from Lisp: code as data, compiler & runtime available

Meta-Programing like in Lisp

macro append_arg(expr)
    println("Before transform:")
    Meta.show_sexpr(expr)
    push!(expr.args, " World")
    println("\nAfter transform:")
    Meta.show_sexpr(expr)
    println()
    return expr
end
@append_arg println("Hello")

0.8s

Julia

Compiler Plugins and Reflections

using InteractiveUtils

@code_lowered example_function(1.0)

1.9s

Julia

CodeInfo( 1 ─ %1 = (Main.sin)(x) │ %2 = (Main.cos)(x) │ %3 = %1 + %2 └── return %3 )

@code_typed optimize=false example_function(1.0)

1.1s

Julia

CodeInfo( 1 ─ %1 = (Main.sin)(x)::Float64 │ %2 = (Main.cos)(x)::Float64 │ %3 = (%1 + %2)::Float64 └── return %3 )=>Float64

@code_llvm debuginfo=:none example_function(1.0)

0.8s

Julia

@code_native example_function(1.0)

0.6s

Julia

Eval & AST manipulations

ssa = (code_lowered(example_function, Tuple{Float64}))[1].code
ast = map(i-> :($(Symbol("var_$i")) = $(ssa[i])), 1:length(ssa))
replace_recursive(f, node) = f(node)
replace_recursive(f, vec::Vector) = map!(x-> replace_recursive(f, x), vec, vec)
replace_recursive(f, node::Expr) = (replace_recursive(f, node.args); node)
replace(x) = x
# easier than transforming it to another call for this simple example
msin(x) = -sin(x) 
replace(x::GlobalRef) = x.name == :sin ? cos : x.name == :cos ? msin : x
replace(x::Core.SSAValue) = Symbol("var_$(x.id)")
replace(x::Core.SlotNumber) = Symbol("arg_$(x.id-1)")
ast2 = replace_recursive(replace, ast)
body = Expr(:block, ast2...)

1.0s

Julia

quote var_1 = (cos)(arg_1) var_2 = (msin)(arg_1) var_3 = var_1 + var_2 var_4 = return var_3 end

@eval transformed(arg_1) = $body

0.8s

Julia

transformed (generic function with 1 method)

derivative(example_function, 1.5) == transformed(1.5)

0.8s

Julia

true

@code_llvm debuginfo=:none transformed(1.5)

0.4s

Julia

There is a Library for this

using Cassette
import Cassette: @context, overdub

@context Derivative

overdub(::Derivative, ::typeof(sin), arg1) = cos(arg1)
overdub(::Derivative, ::typeof(cos), arg1) = -sin(arg1)

y = overdub(Derivative(), example_function, 1.5)
y == transformed(1.5)

4.8s

Julia

true

@code_llvm debuginfo=:none overdub(Derivative(), example_function, 1.5)

0.7s

Julia

Fit for unique ML challenges

Comes with LLVM based JIT Compiler
Lots of tools to implement AD - Lots of packages that implement AD
State of the art GPU computing (works with AD, user defined functions & types)

Proof: Flux.jl

1485 stars on github, pretty much written by one person
Purely written in Julia, easy to extend
somewhat on eye level with TensorFlow in terms of features and performance
No two language problem, fully optimizable, works nicely with Julia Packages

Good AD support means, you can use arbitrary Packages

Libraries don't need to be written with ML in mind
Example:
Use a physics library and differentiate through it to have it learn the Physics
Use DifferentialEquations.jl with Flux

The difference in work needed is immense!

E.g. just imagine not being able to reuse existing libraries:

would need to write own Physics library with only TensorFlow primitives
would need to implement custom AD kernel
Then you can start putting together the Layers
...and maybe quickly figure out that your idea wasn't good

Why not an ML language?

data scrubbing / cleaning / preparation
GUIs / dashboards / web
multi purpose inside DNN

Summary

the good

Julia has great tools to work with code and compiler passes
optimal performance of newly generated code
solves two language problem
has a nice type system
interactive workflow

the bad

Not easy to create AOT compiled binaries
so you will need to wait for compilation

Final words

fun future with compiling parts statically, while interpreting other parts
basically get Python & C++ in one language ... and Lisp :)