Fold

An Explanation for Composability of Clojure Transducers

In A Tutorial on the Universality and Expressiveness of Fold, Graham Hutton presents in a clear and understandable way the advantages of programming by folds. Most of the core concepts exposed there, among which the universal and fusion property of folds are to be found in Bird's book. In this short article we'll apply some of those ideas to Clojure transducers, and we'll show how the fusion property implies the efficiency and composability of transducers. It's beyond of this writing to compare left and right folds for Clojure and their lazyness, since Clojure reduce is (notationally) a left fold and we'll assume all lists are left lists (say Clojure vectors). Note also that what follows is not about Clojure's fold as found in clojure.core.reducers.

The Universal Property of Fold

Following Hutton, we can define a fold function on lists by means of the following properties. For sets inline_formula not implemented and inline_formula not implemented, denote by inline_formula not implemented the set of all functions inline_formula not implemented which we'll call right actions of inline_formula not implemented on inline_formula not implemented. Also when the function inline_formula not implemented is clear from the context, we'll just write inline_formula not implemented for inline_formula not implemented. The inline_formula not implemented function can be defined as

formula not implemented

such that, for a given inline_formula not implemented the following inductive properties hold:

formula not implemented

where inline_formula not implemented is the list obtained as conjunction of a list inline_formula not implemented with an element inline_formula not implemented of inline_formula not implemented (in Clojureinline_formula not implemented).

Now for fixed inline_formula not implemented, (u1) and (u2) form indeed a universal property, i.e. if such a function exists then it's unique and inline_formula not implemented is fully caracterized by these properties. Unicity is proven by means of induction: assume there's functions inline_formula not implemented and inline_formula not implementedwhich satisfy (u1) and (u2) but which disagree i.e. inline_formula not implemented on some list inline_formula not implemented. Now by (u2) they also need to disagree on inline_formula not implemented, hence inline_formula not implemented must have length zero contradicting (u1).

We can also restate (u2) by saying that for all inline_formula not implemented then as a function of inline_formula not implemented in inline_formula not implemented the partial application inline_formula not implemented commutes with inline_formula not implemented and inline_formula not implemented, that is:

formula not implemented

Existence of a fold function is proved by implementation in the context of the majority of programming languages. In Clojure, fold is the reduce function, but the universal property (u1) and (u2) can actually provide a constructive definition:

(defn fold [f i l]
  (if-some [last (peek (vec l))]
		(f (fold f i (butlast l)) last)
    i))
0.2s

Since inline_formula not implemented above satisfies (u1) and (u2) by definition, and we can easily prove it forinline_formula not implemented, then they have to be the same function, but we can build some cheap function-equality check on integer vectors

(require '[clojure.test.check :as c])
(require '[clojure.test.check.generators :as gen])
(require '[clojure.test.check.properties :as p])
(defn =' [& fns]
 (let [pr (p/for-all [v (gen/vector gen/int)]
           (apply = (map #(% v) fns)))]
   (c/quick-check 100 pr)))
3.4s

to get a hint our definition is sound, trying out some concrete example

(=' (partial fold + 0)
    (partial reduce + 0))
0.3s
(=' (partial fold str "")
    (partial reduce str ""))
0.2s

Expressing List Operations in Terms of Fold: Transducers

It is possible to express a lot of functions on lists in terms of fold, amongst the most popular are filter and map:

(defn filter' [pred]
  (fn [xs x] (if (pred x) (conj xs x) xs)))
(fold (filter' odd?) [] (range 10))
filter'
0.1s
(defn map' [phi]
  (fn [xs x] (conj xs (phi x))))
(fold (map' inc) [] (range 9))
map'
0.1s

where inline_formula not implemented and inline_formula not implemented are actions of natural numbers on lists of natural numbers. Clojure transducers bring this pattern one step further: they incapsulate list-like operations independently of the reducing function. Formally, transducers are transformations of actions i.e. functions

formula not implemented

which behaves functorially with respect of folds, this will be explained later.

Functions like inline_formula not implemented and inline_formula not implemented in Clojure, when given a single argument, return a transducer. For instance, loot at

(let [s (with-out-str (clojure.repl/source filter))]
   (println (clojure.string/join (take 420 s))))
0.6s

and let's see it applied to theinline_formula not implementedfunction at first

(fold ((filter odd?) conj) [] (range 6))
0.0s
(fold ((map inc) conj) [] (range 9))
0.0s

and later to the inline_formula not implemented function

(fold ((filter odd?) +) 0 (range 6))
0.1s

The strong point for using transducers in practice is that they offer stack reducing operations in a composable way in which the input list will be visited just once. Take for instance:

(def coll [{:a 1} {:a 2} {:a 3} {:a 4}])
(->> coll
     (map :a)
     (filter odd?)
     (map inc)
     (reduce + 0))
0.1s

At each step above a whole list is returned and fed the next computation which iterates through it again and again. With transducers this won't happen, the following snippet of code reads the input collection just once, encoding the transformations in a single action:

(def xf (comp (map :a)
              (filter odd?)
              (map inc)))
(reduce (xf +) 0 coll)             
0.1s

which in clojure is (almost) the same of the simpler form

(transduce xf + 0 coll)
0.1s

Later you'll also see the reason for this contravariant behaviour in the order of the function composition which is not the natural right-to-left order.

Fusion Property and the Composition of Folds

Having shown that many functions on lists can be expressed in terms of fold, when can we actually assert that a composition of folds is expressible in a fold of a single action? One step in this direction is given by the fusion property.

Given right inline_formula not implemented-actionsinline_formula not implemented and inline_formula not implementedwe we call a function inline_formula not implemented a morphism from f to g if inline_formula not implemented holds for every inline_formula not implemented.

We can prove that inline_formula not implemented is stable under the application of morphisms, i.e. given a morphisminline_formula not implemented of actions like the one above, then we have:

formula not implemented

To prove the above equality we appeal to the universal property: if we can prove (u1) and (u2) of inline_formula not implemented, then the equality above must hold for every list ininline_formula not implemented. While it's trivial to see (u1), (u2) follows by combining commutative diagrams:

formula not implemented

Now, if we want to compose folds as functions of lists we have to restrict to some specific class of actions. We say that an action on lists inline_formula not implemented splits if there exists some function inline_formula not implemented, such that inline_formula not implemented for all inline_formula not implemented. Note that the actions defined in the examples above all split (with some formal imagination in the filter case). Given a splitting inline_formula not implemented-action inline_formula not implemented and an action inline_formula not implemented of inline_formula not implemented on inline_formula not implemented we define a function of actions inline_formula not implementeddefined by

formula not implemented

We can now state a transducing property for folds of splitting actions in terms of: for every splitting inline_formula not implementedand g we have:

formula not implemented

where inline_formula not implemented is inline_formula not implemented.

To prove the transducer lemma it's enough to show that inline_formula not implemented is a morphism between actions inline_formula not implemented and inline_formula not implemented:

formula not implemented

Let's apply the lemma above to a simple Clojure case where inline_formula not implemented is inline_formula not implemented and inline_formula not implemented isinline_formula not implemented, theninline_formula not implementedsplits (by definition) andinline_formula not implementedisinline_formula not implemented:

(=' (comp (partial reduce + 0)
          (partial map inc))
  
    (comp (partial reduce + 0)
          (partial reduce ((map inc) conj) []))
    (partial reduce ((map inc) +) 0))          
0.2s

Now since function composition is associative, repeating the step above we also get for instance

(=' (comp (partial reduce + 0)
          (partial map inc)
          (partial filter odd?))
    (comp (partial reduce ((map inc) +) 0)
          (partial reduce ((filter odd?) conj) []))
    (partial reduce ((filter odd?) ((map inc) +)) 0)
    
    (partial reduce ((comp (filter odd?) (map inc)) +) 0))
0.1s

which explains the countravariant behaviour in the composition of transducers, with respect to the composition of the non-transduced form.

Stateful transducers and cat

There's some transducers which escape the pure form of splitting actions as defined above, most notably

(clojure.repl/source cat)
0.6s

to flatten list outputs on the fly:

(let [coll [{:a [1]} {:a [2]} {:a [3]}]
      xf (comp (map :a) cat)]
 (reduce (xf +) 0 coll))
0.0s

and stateful transducers, like say the 0-ary form ofinline_formula not implemented

(clojure.repl/source distinct)
0.5s

or the 1-ary form of take-while

(clojure.repl/source take-while)
0.6s

which uses the reduced trick to short-circuit the fold, allowing for very nice stuff like

(def terms [{:do true :val 1} 
            {:do true :val 2}
            {:do true :val 2}
            {:do true :val 1}
            {:do true :val 3}
            {:do false :val 4}
            {:do true :val 5}
            {:do true :val 6}])
(let [xf (comp (take-while :do)
               (map :val)
               (distinct))]
	(reduce (xf +) 0 terms))
0.0s

If you'd like to discuss this, find me at @lo_zampino on Twitter. Or remix this article to explore transducers yourself!

Appendix

{:deps {org.clojure/test.check {:mvn/version "0.9.0"}}}
Extensible Data Notation
(require '[clojure.test.check])
(clojure-version)
2.0s
Runtimes (2)