Julia: A Compiler for the Future
About Me
- studied Cognitive Science
- Computer Vision & ML
Worked for the Julia Lab
- Now at Nextjournal
Motivation
- Nextjournal: interactive work flows, data science
- Me: interactive work flows for low-level programing
- Extensible ML in a sane language
- The Julia Language!
- Python's adoption is sky-rocking, and it's undeniably successful!
we should really try to understand Python
What is Python
interpreted, dynamic, multi paradigm language
convenient for scripts & interactive programing
pure Python is really slow (50x-300x slower than C)
- simple type system, basic meta programming
- people don't care about performance (or do they?)
Data Science & ML driver of growth
TensorFlow, PyTorch, Keras, Theano
data science usually needs lots of performance...
- ... and Python is the most popular language for it!?!
C/C++ to the rescue
Konrad Hinsen has an explanation for this:
Python is amazing at gluing (C/C++) libraries together in scripts
Numpy / Pandas are actually written in C++
50% of Python's top 10 packages use C/C++
The best of 2 worlds
number crunching happens in fast, compiled libraries
scripting happens in fun, dynamic language without any compilation
Perfect platform to glue libraries together
Is it, though?
- C++ Packages usually don't work with Python classes
Python callbacks are slow (e.g. when used in a solver)
- Inter Procedural Optimization (IPO) inhibited
- constant need to rewrite packages in C++/Cython
- Two language problem: hard to maintain, hard to contribute
Summary
- people trade in interactivity & development speed for slow execution & no typing
- Core libraries not actually written in Python
How would a language need to look like to write an ML library?
High Level view of a DNN (or most ML):
Inside the tunable Function:
Tuning a.k.a Back-Propagation:
include(utils.jl) function next_position(position, angle) position .+ (sin(angle), cos(angle)) end # Our tunable function ... or chain of flexible links function predict(chain, input) output = next_position(input, chain[1]) # Layer 1 output = next_position(output, chain[2]) # Layer 2 output = next_position(output, chain[3]) # Layer 3 output = next_position(output, chain[4]) # Layer 4 return output end function loss(chain, input, target) sum((predict(chain, input) .- target) .^ 2) end chain = [(rand() * pi) for i in 1:4] input, target = (0.0, 0.0), (3.0, 3.0) weights, s = visualize(chain, input, target) s
using Zygote function loss_gradient(chain, input, target) # first index, to get gradient of first argument Zygote.gradient(loss, chain, input, target)[1] end for i in 1:100 # get gradient of loss function angle∇ = loss_gradient(chain, input, target) # update weights with our loss gradients # this updates the weights in the direction of smaller loss chain .-= 0.01 .* angle∇ # update visualization weights[] = chain sleep(0.01) end;
Summary
DNN is build from many small parametrized functions
Loss function calls all those functions
Functions could be user defined or primitives from the ML framework
Frameworks need to differentiate loss function and execute it fast
AD needs to transform code
example_function(x) = sin(x) + cos(x) # needs to be rewritten to: derivative(::typeof(example_function), x) = cos(x) - sin(x)
- huge call graph that needs to be transformed
- result of transformation needs to execute with flawless performance
- you want to do: fusing of calculations, inlining, removing temporaries, execute on the gpu
- deep in the territory of compiler & language research
Lots of Frameworks recognize this
- New Intermediate Representations (IR) keep popping up:
- PyTorch IR (facebook), [XLA, MLIR, Jax] (google)
- Translates Python API, or Python functions to IR
Then with IR do: IPO, automatic differentiation, fusing, compiling native code
Most use LLVM to generate native CPU/GPU code
What if you're not Google or Facebook?
This sounds like a job for Julia
- multi paradigm, dynamic language
- compiled at runtime -> as fast as C
ease of use and elegance on par with Python
- syntax optimized for writing math
- has LLVM JIT & Compiler Plugins
- Coming from Lisp: code as data, compiler & runtime available
Meta-Programing like in Lisp
macro append_arg(expr) println("Before transform:") Meta.show_sexpr(expr) push!(expr.args, " World") println("\nAfter transform:") Meta.show_sexpr(expr) println() return expr end println("Hello")
Compiler Plugins and Reflections
using InteractiveUtils example_function(1.0)
optimize=false example_function(1.0)
debuginfo=:none example_function(1.0)
example_function(1.0)
Eval & AST manipulations
ssa = (code_lowered(example_function, Tuple{Float64}))[1].code ast = map(i-> :($(Symbol("var_$i")) = $(ssa[i])), 1:length(ssa)) replace_recursive(f, node) = f(node) replace_recursive(f, vec::Vector) = map!(x-> replace_recursive(f, x), vec, vec) replace_recursive(f, node::Expr) = (replace_recursive(f, node.args); node) replace(x) = x # easier than transforming it to another call for this simple example msin(x) = -sin(x) replace(x::GlobalRef) = x.name == :sin ? cos : x.name == :cos ? msin : x replace(x::Core.SSAValue) = Symbol("var_$(x.id)") replace(x::Core.SlotNumber) = Symbol("arg_$(x.id-1)") ast2 = replace_recursive(replace, ast) body = Expr(:block, ast2...)
transformed(arg_1) = $body
derivative(example_function, 1.5) == transformed(1.5)
debuginfo=:none transformed(1.5)
There is a Library for this
using Cassette import Cassette: , overdub Derivative overdub(::Derivative, ::typeof(sin), arg1) = cos(arg1) overdub(::Derivative, ::typeof(cos), arg1) = -sin(arg1) y = overdub(Derivative(), example_function, 1.5) y == transformed(1.5)
debuginfo=:none overdub(Derivative(), example_function, 1.5)
Fit for unique ML challenges
Comes with LLVM based JIT Compiler
Lots of tools to implement AD - Lots of packages that implement AD
State of the art GPU computing (works with AD, user defined functions & types)
Proof: Flux.jl
1485 stars on github, pretty much written by one person
- Purely written in Julia, easy to extend
- somewhat on eye level with TensorFlow in terms of features and performance
No two language problem, fully optimizable, works nicely with Julia Packages
Good AD support means, you can use arbitrary Packages
Libraries don't need to be written with ML in mind
Example:
The difference in work needed is immense!
E.g. just imagine not being able to reuse existing libraries:
- would need to write own Physics library with only TensorFlow primitives
- would need to implement custom AD kernel
Then you can start putting together the Layers
- ...and maybe quickly figure out that your idea wasn't good
Why not an ML language?
data scrubbing / cleaning / preparation
GUIs / dashboards / web
- multi purpose inside DNN
Summary
the good
- Julia has great tools to work with code and compiler passes
- optimal performance of newly generated code
- solves two language problem
- has a nice type system
- interactive workflow
the bad
- Not easy to create AOT compiled binaries
- so you will need to wait for compilation
Final words
- fun future with compiling parts statically, while interpreting other parts
- basically get Python & C++ in one language ... and Lisp :)