

thegeez blog - My understanding of reducers after EuroClojure
source link: http://thegeez.net/2012/06/12/euroclojure_reducers.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

My understanding of reducers after EuroClojure
12 Jun 2012
At EuroClojure Rich Hickey gave a second, unscheduled talk about the new reducers library. The talk clarified a few points for me, that I didn't initially get from the explaining blogposts on clojure.com/blog. For the Amsterdam Clojure Meetup I wrote down these notes:
Reducers vs seq/lazy-seq api
This is mostly explained in the first
blogpost: Reducers
- A Library and Model for Collection Processing.
The reducers are partly about de-complecting representation, order
and mechanism. For example: the map reducer is only a recipe to apply a function to each item in the collection. Compare the following:
[2 3 4] === (vec (map inc [1 2 3])) === (into [] (r/map inc [1 2 3])) === (reduce conj [] (r/map inc [1 2 3]))
(here map is the clojure.core/map, r/map is the new clojure.core.reducers/map)
A reducer is only the recipe step of transforming a collection. Building a collection or a result is an explicit, separate reduce step. The building of the collection and the what to do with the input collection are separate here. When you replace r/map with a composition of various reducers the actual construction of the results is only the final step. Compare this to the intermediate lazy-seq results that are produced when composing the old seq functions.
Reducers and fold for parallelism
This is mostly explained in the second
blogpost: Anatomy
of a Reducer.
The split between building results and the reducer recipes also opens up some nice possibilities for parallelism, through the fork/join framework.
The example Rich gave was (not verbatim this, but close enough):
(def v (vec (range 100000000000))) (reduce + (map inc v)) vs. (reduce + (r/map inc v)) vs. (r/fold + (r/map inc v))
I think the timings were: reduce+r/map 2x fast as reduce+map, and r/fold+r/map 4x fast as reduce+map.
Fold is simply the name for the new parallel reduce function, it is not meant to refer to a function with the same name in other languages.
In the example the fold function is not passed a combiner. The plus function is used for the combiner here as well. The seed for the reduce at the leafs is the combiner function called with 0 arguments.
Fold will try to do the computation in parallel using fork/join, when the input collection that is asked to apply the reducer to itself supports this and when the reducer supports this. The check for support is done through protocols: for the datastructures: PersistentVector extends CollFold, for the reducers: r/map is defined as a folder, while r/take-while is defined as a reducer (and does not support fold, because partitioning does not make sense for this computation). When parallel fold is not supported, then fold will just do a reduce. See the implementation for details: reducers.clj
A common approach for parallel processing is to use map+reduce. The work is divided into partitions and map is applied to each partition and the intermediate results are combined with reduce. In fold the approach is reduce+combine. The work on the smallest partition is done with a reduce rather than with map. Compare the two approaches by trying to express filter in map+reduce versus reduce+combine. It appeals to the functional programming sensibilities that reduce is a better fit for most operations than map.
Fold works nicely with Clojure datastructures. Many datastructures in Clojure are trees, which can be easily partitioned. As Rich explained: explicit parallel datastructures (as found in other languages) make no sense because being parallel is a property of an algorithm, not the data.
The ultimate benefit of the reducers library is this: In the past you could make your programs faster if you simply waited a while for faster hardware (Moore's law). Fold and reduce+combine bring back that promise for data processing with more cores, rather than faster cpu's.
Recommend
-
13
Fly over Amsterdam with ClojureScript01 Apr 2018 Controls: Arrow keys Speed: Page Up & Page downCenter stick: Enter Restart Enter + ShiftSwitch flight/camera mode: C
-
11
Data wrangling with transducers for a machine learning problem28 Apr 2017 The transducers from the net.cgrand.xforms library are great to transform and analyze data with in Clo...
-
14
Probabilistic programming example with Anglican13 Nov 2017 At the last Amsterdam Clojure Meetup there was a great demostration...
-
18
EuroClojure 2017 Berlin: crepl talk video17 Aug 2017 At EuroClojure 2017 I gave a talk titled: "Building a collaborative web app with ClojureScript...
-
18
Dutch Clojure Days 2017: crepl demo13 Apr 2017 At the Dutch Clojure Days 2017 in Amsterdam I gave a lightning talk about crepl. With crepl you...
-
11
crepl and c3e at Dutch Clojure Days 201725 Mar 2017 Today was the 2017 edition of Dutch Clojure Days (DCD17), the annual international gathering of Clojure enthusiasts and practitio...
-
14
Defn Podcast: Episode 1821 Feb 2017 Last week I joined Ray and Vijay on the Defn Podcast. We talked about Clojure as well as my latest project: crepl. I had a great time talking w...
-
7
EuroClojure 2017 Berlin: crepl talk video17 Aug 2017 At EuroClojure 2017 I gave a talk titled: "Building a collaborative web app with ClojureScript...
-
3
My understanding of reducers after EuroClojure12 Jun 2012 At EuroClojure Rich Hickey gave a second, unscheduled talk about the new reducers library. The talk clarified a few points for me, that I didn't initially get fr...
-
3
EuroClojure 2013 was a nice, small conference. The talks were mostly interesting and often useful and it was wonderful to meet in reality people I only knew from the virtual life or from stor...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK