Simon Danisch / Jul 18 2019

Julia: A Compiler for the Future

About Me

  • studied Cognitive Science
  • Computer Vision & ML
  • Worked for the Julia Lab

  • Now at Nextjournal

Motivation

  • Nextjournal: interactive work flows, data science
  • Me: interactive work flows for low-level programing
  • Extensible ML in a sane language
  • The Julia Language!
  • Python's adoption is sky-rocking, and it's undeniably successful!
  • we should really try to understand Python

What is Python

  • interpreted, dynamic, multi paradigm language

  • convenient for scripts & interactive programing

  • pure Python is really slow (50x-300x slower than C)

  • simple type system, basic meta programming
  • people don't care about performance (or do they?)

Data Science & ML driver of growth

TensorFlow, PyTorch, Keras, Theano

  • data science usually needs lots of performance...

  • ... and Python is the most popular language for it!?!

C/C++ to the rescue

  • Konrad Hinsen has an explanation for this:

  • Python is amazing at gluing (C/C++) libraries together in scripts

  • Numpy / Pandas are actually written in C++

  • 50% of Python's top 10 packages use C/C++

The best of 2 worlds

  • number crunching happens in fast, compiled libraries

  • scripting happens in fun, dynamic language without any compilation

  • Perfect platform to glue libraries together

Is it, though?

  • C++ Packages usually don't work with Python classes
  • Python callbacks are slow (e.g. when used in a solver)

  • Inter Procedural Optimization (IPO) inhibited
  • constant need to rewrite packages in C++/Cython
  • Two language problem: hard to maintain, hard to contribute

Summary

  • people trade in interactivity & development speed for slow execution & no typing
  • Core libraries not actually written in Python

How would a language need to look like to write an ML library?

High Level view of a DNN (or most ML):

Inside the tunable Function:

Tuning a.k.a Back-Propagation:

utils.jl
include(
utils.jl
) function next_position(position, angle) position .+ (sin(angle), cos(angle)) end # Our tunable function ... or chain of flexible links function predict(chain, input) output = next_position(input, chain[1]) # Layer 1 output = next_position(output, chain[2]) # Layer 2 output = next_position(output, chain[3]) # Layer 3 output = next_position(output, chain[4]) # Layer 4 return output end function loss(chain, input, target) sum((predict(chain, input) .- target) .^ 2) end chain = [(rand() * pi) for i in 1:4] input, target = (0.0, 0.0), (3.0, 3.0) weights, s = visualize(chain, input, target) s
using Zygote
function loss_gradient(chain, input, target)
  # first index, to get gradient of first argument
  Zygote.gradient(loss, chain, input, target)[1]
end
for i in 1:100
  # get gradient of loss function
  angle∇ = loss_gradient(chain, input, target)
  # update weights with our loss gradients
  # this updates the weights in the direction of smaller loss
  chain .-= 0.01 .* angle∇
  # update visualization
  weights[] = chain
  sleep(0.01)
end;

Summary

  • DNN is build from many small parametrized functions

  • Loss function calls all those functions

  • Functions could be user defined or primitives from the ML framework

  • Frameworks need to differentiate loss function and execute it fast

AD needs to transform code

example_function(x) = sin(x) + cos(x)
# needs to be rewritten to:
derivative(::typeof(example_function), x) = cos(x) - sin(x)
derivative (generic function with 1 method)
  • huge call graph that needs to be transformed
  • result of transformation needs to execute with flawless performance
  • you want to do: fusing of calculations, inlining, removing temporaries, execute on the gpu
  • deep in the territory of compiler & language research

Lots of Frameworks recognize this

  • New Intermediate Representations (IR) keep popping up:
  • PyTorch IR (facebook), [XLA, MLIR, Jax] (google)
  • Translates Python API, or Python functions to IR
  • Then with IR do: IPO, automatic differentiation, fusing, compiling native code

  • Most use LLVM to generate native CPU/GPU code

What if you're not Google or Facebook?

This sounds like a job for Julia

  • multi paradigm, dynamic language
  • compiled at runtime -> as fast as C
  • ease of use and elegance on par with Python

  • syntax optimized for writing math
  • has LLVM JIT & Compiler Plugins
  • Coming from Lisp: code as data, compiler & runtime available

Meta-Programing like in Lisp

macro append_arg(expr)
    println("Before transform:")
    Meta.show_sexpr(expr)
    push!(expr.args, " World")
    println("\nAfter transform:")
    Meta.show_sexpr(expr)
    println()
    return expr
end
@append_arg println("Hello")

Compiler Plugins and Reflections

using InteractiveUtils

@code_lowered example_function(1.0)
CodeInfo( 1 ─ %1 = (Main.sin)(x) │ %2 = (Main.cos)(x) │ %3 = %1 + %2 └── return %3 )
@code_typed optimize=false example_function(1.0)
CodeInfo( 1 ─ %1 = (Main.sin)(x)::Float64 │ %2 = (Main.cos)(x)::Float64 │ %3 = (%1 + %2)::Float64 └── return %3 )=>Float64
@code_llvm debuginfo=:none example_function(1.0)
@code_native example_function(1.0)

Eval & AST manipulations

ssa = (code_lowered(example_function, Tuple{Float64}))[1].code
ast = map(i-> :($(Symbol("var_$i")) = $(ssa[i])), 1:length(ssa))
replace_recursive(f, node) = f(node)
replace_recursive(f, vec::Vector) = map!(x-> replace_recursive(f, x), vec, vec)
replace_recursive(f, node::Expr) = (replace_recursive(f, node.args); node)
replace(x) = x
# easier than transforming it to another call for this simple example
msin(x) = -sin(x) 
replace(x::GlobalRef) = x.name == :sin ? cos : x.name == :cos ? msin : x
replace(x::Core.SSAValue) = Symbol("var_$(x.id)")
replace(x::Core.SlotNumber) = Symbol("arg_$(x.id-1)")
ast2 = replace_recursive(replace, ast)
body = Expr(:block, ast2...)
quote var_1 = (cos)(arg_1) var_2 = (msin)(arg_1) var_3 = var_1 + var_2 var_4 = return var_3 end
@eval transformed(arg_1) = $body
transformed (generic function with 1 method)
derivative(example_function, 1.5) == transformed(1.5)
true
@code_llvm debuginfo=:none transformed(1.5)

There is a Library for this

using Cassette
import Cassette: @context, overdub

@context Derivative

overdub(::Derivative, ::typeof(sin), arg1) = cos(arg1)
overdub(::Derivative, ::typeof(cos), arg1) = -sin(arg1)

y = overdub(Derivative(), example_function, 1.5)
y == transformed(1.5)
true
@code_llvm debuginfo=:none overdub(Derivative(), example_function, 1.5)

Fit for unique ML challenges

  • Comes with LLVM based JIT Compiler

  • Lots of tools to implement AD - Lots of packages that implement AD

  • State of the art GPU computing (works with AD, user defined functions & types)

Proof: Flux.jl

  • 1485 stars on github, pretty much written by one person

  • Purely written in Julia, easy to extend
  • somewhat on eye level with TensorFlow in terms of features and performance
  • No two language problem, fully optimizable, works nicely with Julia Packages

Good AD support means, you can use arbitrary Packages

The difference in work needed is immense!

E.g. just imagine not being able to reuse existing libraries:

  • would need to write own Physics library with only TensorFlow primitives
  • would need to implement custom AD kernel
  • Then you can start putting together the Layers

  • ...and maybe quickly figure out that your idea wasn't good

Why not an ML language?

  • data scrubbing / cleaning / preparation

  • GUIs / dashboards / web

  • multi purpose inside DNN

Summary

the good

  • Julia has great tools to work with code and compiler passes
  • optimal performance of newly generated code
  • solves two language problem
  • has a nice type system
  • interactive workflow

the bad

  • Not easy to create AOT compiled binaries
  • so you will need to wait for compilation

Final words

  • fun future with compiling parts statically, while interpreting other parts
  • basically get Python & C++ in one language ... and Lisp :)