3

Implementing the Sieve of Eratosthenes with Functional Programming

 2 years ago
source link: http://raganwald.com/2013/02/23/sieve.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

RAGANWALD.COM

Implementing the Sieve of Eratosthenes with Functional Programming


Programming interviews often include a “Fizzbuzz” test, some simple programming task that is intended to establish that the candidate can, in fact, program something trivial. It isn’t intended as a deep architectural dive or as an algorithms challenge. Under ideal, non-pressure situations it shouldn’t take more than fifteen minutes to complete.

One common exercise is to print out some prime numbers, perhaps a hundred or a thousand. We’ll write a JavaScript program to do that, and we’ll implement the well-known Sieve of Eratosthenes algorithm.

functional iterators

Our approach will rest on functional iterators, stateful functions that you call over and over again to get values. Functional iterators return undefined if and when they have no more values to provide. Functional iterators are handy for two reasons: They decouple the mechanism of iterating over a data structure from what you do with its values, and they are also handy for dealing with data structures that may be so large as to be unwieldy, including infinitely large data sets.

To implement our Sieve of Eratosthenes, we’re going to use a few of the tools provided by allong.es, the JavaScript Function Utility Belt. Let’s get started.

unfolding

All implementations of the Sieve rest on filtering numbers that are divisible by any smaller prime number. To be true to the spirit of the Sieve, you must emulate the behaviour of filtering all of the numbers going forward, not iterating forward and checking each number against a list of primes.

So the first thing we’ll need is a collection of natural numbers. Since we don’t know how many numbers we’ll need, we might as well start with all of the numbers from two upwards. That’s very easy to write as an iterator:

function NumbersFromTwo () {
  var number = 2;
  return function () {
    return number++;
  }
};

var i = NumbersFromTwo();
i();
  //=> 2
i();
  //=> 3
i();
  //=> 4

As you can see, we have a stateful function that returns 2 the first time you call it, and thereafter returns the next largest natural number.

NumbersFromTwo is a specific case of a very general pattern: Starting with a seed, generate an iteration where each value generated is calculated by applying a function to the previous value. When you start with a list, collection, or iteration and “reduce” it to a value, computer scientists call that a “fold.” And the reverse operation, its dual, is called an “unfold.”

For a very simple case like this, we don’t really need to be explicit about our unfolding, but for the sake of seeing how things work, let’s use a helper function from the allong.es library and rewrite it as an unfold:

var unfold = require('allong.es').iterators.unfold;

var i = unfold(2, function (value) {
  return value + 1;
});

i();
  //=> 2
i();
  //=> 3
i();
  //=> 4

unfold returns an iterator. The first time you call the iterator, it returns the seed. Thereafter, it applies the function to the previous value to determine the next value to return. It may not seem like much of a win for a very short function, but making an unfold explicit like this also makes it easier to reason about the iterator than if you wrote a stateful function by hand.

Now we have a function that iterates over the natural numbers starting with 2. How would we “sieve” it?

accumulate with return

In The Drunken Walk Programming problem, we were introduced to the accumulate function. It works a lot like a fold, reduce, or inject method: It iterates over a collection, accumulating a state along the way. The special sauce in accumulate is that instead of waiting for the end of the collection and returning a single value, accumulate returns an iterator that iterates over the state value as it iterates over the list. For example:

var accumulate = require('allong.es').iterators.accumulate,
    FlatArrayIterator = require('allong.es').iterators.FlatArrayIterator;

var i = accumulate(FlatArrayIterator([1,2,3,4,5]), function (acc, n) {
  return acc + n;
}, 0);

i();
  //=> 1
i();
  //=> 3
i();
  //=> 6
i();
  //=> 10
i();
  //=> 15

Instead of returning the sum of the array as a single value, this iterator created with accumulate iterates over the running total.

accumulate is a terrific tool, but it does conflate the state with the value returned. Thus, you often find yourself writing a map from the accumulated state to the values you want. In some cases this can be either expensive or unclear, so allong.es provides a function that separates the concerns of accumulating state from generating values.

Here’s a ridiculous “square of the running total” example:

var accumulateWithReturn = require('allong.es').iterators.accumulateWithReturn,
    FlatArrayIterator = require('allong.es').iterators.FlatArrayIterator;

var i = accumulateWithReturn(FlatArrayIterator([1,2,3,4,5]), function (acc, n) {
  acc = acc + n;
  return [acc, acc * acc];
}, 0);

i();
  //=> 1
i();
  //=> 9
i();
  //=> 36
i();
  //=> 100
i();
  //=> 225

accumulateWithReturn takes a function that returns two values as an array. The first is the state for the next iteration, the second is the value to return for that iteration. So how does that help?

Animated Sieve of Eratosthenes

Illustrated Sieve from Wikipedia

sieving numbers

Let’s consider what we want to do to “sieve” all the multiples of a number from the natural numbers. Essentially, we want to count as we go along. If our number is “5,” we count 1, 2, 3, 4, “stroke out!” 1, 2, 3, 4, “stroke out!” We could also choose to count down, as in 5, 4, 3, 2, “stroke out!” 5, 4, 3, 2, “stroke out!” We’ll go with counting down,although you’re welcome to rewrite it counting up and see that we get the same results.

Thus, we want to maintain a counter as we iterate through the numbers. But what do we want to return? If the counter is greater than one, we return the number. But if our counter reaches one, we return some special flag indicating we’ve stroked the number our and reset our count to our number.

Obviously, this could be much easier: We can take the modulus of the number directly with a simple map. But that isn’t the literal Sieve of Eratosthenes. And gosh-diddley-honest, we are going to implement this algorithm literally.

For more on what makes an algorithm a genuine Sieve of Eratosthenes, Melissa O’Neill has written a terrific paper: The Genuine Sieve of Eratosthenes.

Here we are. We’ll use false as our value for numbers being stroked out:

var accumulateWithReturn = require('allong.es').iterators.accumulateWithReturn,
    unfold = require('allong.es').iterators.unfold;
    
function SieveMultiplesOf (iter, aPrime) {
  return accumulateWithReturn(iter, function (counter, number) {
    if (counter-- === 1) {
      counter = aPrime;
      return [counter, false];
    }
    else return [counter, number]
  }, aPrime);
};

var numbersFromOne = unfold(1, function (value) {
  return value + 1;
});
    
var i = SieveMultiplesOf(numbersFromOne, 3);
i();
  //=> 1
i();
  //=> 2
i();
  //=> false
i();
  //=> 4
i();
  //=> 5
i();
  //=> false

Notice that we can sieve an already sieved iteration:

var numbersFromOne = unfold(1, function (value) {
  return value + 1;
});

var sieveThrees = SieveMultiplesOf(numbersFromOne, 3);
    
var i = SieveMultiplesOf(sieveThrees, 5);
i();
  //=> 1
i();
  //=> 2
i();
  //=> false
i();
  //=> 4
i();
  //=> false
i();
  //=> false
i();
  //=> 7
i();
  //=> 8
i();
  //=> false
i();
  //=> false

We’re just about ready!

the final step

With an unfold, we start with a seed and transform it with every iteration into the next value to be returned. That, in a nutshell, is what we’re going to do with an iteration over the numbers from two: transform it with every iteration by sieving it.

So we expect to see something like this in the middle of our code:

remainingNumbers = SieveMultiplesOf(remainingNumbers, nextPrimeNumber);

However, we don’t actually want to return the sieved numbers with each iteration, we want to return each prime. So we need an unfold that separates state from the return value. Hmmm. accumulateWithReturn is just like accumulate, only it separates state from return value. Could it be that unfoldWithReturn is just like unfold but separates state from return value?

var unfoldWithReturn = require('allong.es').iterators.unfoldWithReturn,
    find = require('allong.es').iterators.find;
    
function PrimeNumbers () {
  var numbersFromTwo = unfold(2, function (value) {
    return value + 1;
  });
  
  return unfoldWithReturn(numbersFromTwo, function (remainingNumbers) {
    var nextPrimeNumber = find(remainingNumbers, function (n) { return !!n; });
    remainingNumbers = SieveMultiplesOf(remainingNumbers, nextPrimeNumber);
    return [remainingNumbers, nextPrimeNumber];
  });
};

var i = PrimeNumbers();
i();
  //=> 2
i();
  //=> 3
i();
  //=> 5
i();
  //=> 7
i();
  //=> 11

Our algorithm takes advantage of the fact that find from allong.es calls the iterator until a value matches the predicate function. It will thus skip over all of the false values representing multiples that have been crossed out.

The first number not crossed out is always the next prime to return. So, we return the new state consisting of the remaining numbers with multiples of the new prime number crossed out, and the new prime number as our return value.

the complete solution

var I                    = require('allong.es').iterators,
    accumulateWithReturn = I.accumulateWithReturn,
    unfold               = I.unfold,
    unfoldWithReturn     = I.unfoldWithReturn,
    find                 = I.find,
    take                 = I.take;
    
function PrimeNumbers () {
  var numbersFromTwo = unfold(2, function (value) {
    return value + 1;
  });
  
  function SieveMultiplesOf (iter, aPrime) {
    return accumulateWithReturn(iter, function (counter, number) {
      if (counter-- === 1) {
        counter = aPrime;
        return [counter, false];
      }
      else return [counter, number]
    }, aPrime);
  };
  
  return unfoldWithReturn(numbersFromTwo, function (remainingNumbers) {
    var nextPrimeNumber = find(remainingNumbers, function (n) { return !!n; });
    remainingNumbers = SieveMultiplesOf(remainingNumbers, nextPrimeNumber);
    return [remainingNumbers, nextPrimeNumber];
  });
};

var prime, first100primes = take(PrimeNumbers(), 1000);

while((prime = first100primes()) != null) {
  console.log(prime);
}
  //=> 2, 3, 5, ..., 7901, 7907, 7919

The source code for the utility functions we’re using can all be reviewed online in iterators.js.

(discuss)

bonus

Eratosthenes actually advocated an optimization of this algorithm. I left it out initially because it changes SieveMultiplesOf in such a way that it isn’t as easy to verify its behaviour separately because it’s coupled to the way numbers are taken off the front of the iterators:

function SieveMultiplesOf (iter, aPrime) {
  return accumulateWithReturn(iter, function (counter, number) {
    if (counter-- === 1) {
      counter = aPrime;
      return [counter, false];
    }
    else return [counter, number]
  }, aPrime * (aPrime - 1));
};

post script

I actually failed that particular interview test when I took it. I can’t tell you why, it was “Just one of those days,” I guess. But I’ve never forgotten the fact that no matter how simple the test, interviews are high-pressure situations where anyone can “choke.” Well, maybe not anyone. But I certainly can.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK