48

GitHub - gdejohn/procrastination: Lazily evaluated, memoizing, purely functional...

 5 years ago
source link: https://github.com/gdejohn/procrastination
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

README.md

Artifact repository Javadoc License Build status Code coverage

procrastination

procrastination is a modular, zero-dependency library for Java 11 that provides:

  • lazily evaluated, memoizing, purely functional data structures
  • ad hoc pattern matching
  • an extensible, reusable alternative to Java 8's Stream
  • stack-safe tail-recursive lambda expressions via trampolines and fixed points

Data Structures

Lazy evaluation means that the data structures procrastinate, doing the absolute minimum amount of work required and putting it off for as long as possible, only computing each of their elements on demand. They can also be memoized such that each element is computed at most once, the first time it's requested, and then cached.

And because the data structures are purely functional, they are fully persistent. Instead of mutators, methods are provided that return a new version of a given data structure reflecting the desired changes, leaving the previous version intact. Every version remains accessible. This is implemented efficiently via structural sharing, which is safe because the data structures are structurally immutable (i.e., elements cannot be added, removed, or replaced).

The data structures are designed to emulate algebraic data types, with basic "data constructor" static factory methods mirrored by abstract match() instance methods simulating pattern matching. This is not a general pattern-matching facility; there is no matching against literals, no wildcard patterns, no nested patterns. It simply makes it possible to distinguish which data constructor was used and extract the components in a single step.

The rest of the operations on these data structures are all ultimately defined in terms of the match() methods and data constructors. None of the classes hide anything interesting. They don't declare any instance fields. They each have just one constructor (in the Java sense), declared private, taking no arguments, with an empty body. If some useful operation is missing, anyone can define it externally as a static method just as easily as writing an instance method inside the class.

While it is easy to define new operations on these types, it is impossible to add new cases. Since the classes don't expose their (Java) constructors, they are effectively sealed types, so the match() methods will always exhaustively cover every case.

Sequence

Sequence is an ordered collection of zero or more non-null elements (duplicates allowed). It is recursively defined: a sequence is either empty, or it's constructed from a head element and a tail sequence. Conversely, the instance method Sequence.match(BiFunction,Supplier) pulls a sequence apart: if the sequence is non-empty, it applies the given binary function to the head and tail and returns the result, otherwise it returns a default value produced by the given supplier. Because sequences are lazy, it is perfectly natural to work with infinite sequences. (Just be careful not to fully evaluate them!)

Maybe

Maybe represents a value that may or may not exist. It can be thought of as a sequence with at most one element and is often used to model potential failure. Maybe is a lazy alternative to Optional.

Either

Either is a container with exactly one element that can take on one of two possible values, labeled left and right, which may have different types. Like Maybe, it can be used to model failure, but it allows information to be attached to the failure case (e.g., an exception, or a string error message). In that sense, it is the data-structure analogue of Java's checked exceptions.

Pair

Pair is an ordered collection with exactly two elements which may have different types (i.e., a 2-tuple). Pair is similar to Entry, but by contrast it is lazy and purely functional.

Sequences vs. Streams

Sequence offers an alternative to the Stream API introduced in Java 8. Like streams, there are a variety of methods to go back and forth between sequences and other representations, including collections, arrays, and streams. Unlike streams, sequences can be traversed any number of times (although this does mean that sequences derived from one-shot sources like iterators must be memoized). Sequences also provide a much more comprehensive API, and it's significantly easier to define new functionality for sequences. One of the biggest goals of the Stream API was parallel processing, which is why streams were designed around spliterators. So, processing a given stream in a way that isn't covered by the API means working directly with its spliterator, and creating a new stream in a way that isn't covered by the API means implementing a spliterator. Consider the instance method Sequence.scanLeft(Object,BiFunction), which returns the lazy sequence of intermediate results of a left fold. Here's a basic implementation for streams:

import java.util.Spliterator;
import java.util.Spliterators.AbstractSpliterator;
import java.util.function.BiFunction;
import java.util.function.Consumer;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;

static <T, R> Stream<R> scanLeft(Stream<T> stream, R initial, BiFunction<R, T, R> function) {
    return StreamSupport.stream(
        new AbstractSpliterator<>(Long.MAX_VALUE /* estimated size */, 0 /* characteristics */) {
            Spliterator<T> spliterator = null;

            R result = initial;

            @Override
            public boolean tryAdvance(Consumer<? super R> action) {
                if (spliterator == null) {
                    spliterator = stream.spliterator();
                } else if (!spliterator.tryAdvance(element -> result = function.apply(result, element))) {
                    return false;
                }
                action.accept(result);
                return true;
            }
        },
        false // parallel
    );
}

It's imperative, stateful, and the control flow is a little hard to follow; not exactly pleasant. (Note that the characteristics and estimated size of the source stream's spliterator are ignored.) On the other hand, because sequences are recursively defined, they admit a concise, natural recursive implementation of scanLeft() that anyone could easily write if it weren't already included:

import io.github.gdejohn.procrastination.Sequence;
import java.util.function.BiFunction;

static <T, R> Sequence<R> scanLeft(Sequence<T> sequence, R initial, BiFunction<R, T, R> function) {
    return Sequence.cons(
        initial,
        () -> sequence.match(
            (head, tail) -> scanLeft(tail, function.apply(initial, head), function),
            () -> Sequence.empty()
        )
    );
}

Compared to the stream implementation, it's declarative, straightforward, and a lot less code; not bad! (With a little bit of code golfing, the method body could even fit comfortably on a single line!) The trade-off is that sequences are, as the name suggests, sequential; there is no parallel processing. The priority here is developer ergonomics. If you ever find yourself reaching for something in the Stream API that isn't there, and your workload doesn't benefit from parallelism, give Sequence a try!

Trampolines

Applying imperative idioms to sequences is ugly and error prone; recursive data types call for recursive algorithms. Unfortunately, Java isn't very recursion friendly: deep call stacks quickly run afoul of stack overflow errors, and tail recursion doesn't help because there's no tail-call elimination. This isn't a problem for lazy operations like Sequence.map(Function), but whenever a potentially large number of elements must be eagerly traversed, as in Sequence.filter(Predicate), stack overflow is waiting to pounce. Enter trampolines.

Trampoline transforms tail recursion into a stack-safe loop. To trampoline a tail-recursive method with return type R, change the return type to Trampoline<R>, wrap the expressions returned in base cases with the static factory method terminate(), suspend recursive calls in Supplier lambda expressions, and wrap the suspended recursive calls with the static factory method call(). For example:

/** True if and only if the sequence contains at least one element that satisfies the predicate. */
static <T> Trampoline<Boolean> any(Sequence<T> sequence, Predicate<T> predicate) {
    return sequence.match(
        (head, tail) -> predicate.test(head) ? terminate(true) : call(() -> any(tail, predicate)),
        () -> terminate(false)
    );
}

To get the result from a trampoline, use the instance method evaluate():

boolean result = any(sequence, predicate).evaluate();

Anonymous Recursion

Tail recursion often requires additional "accumulator" parameters, and trampolining means that the result must be unwrapped. These are irrelevant and burdensome implementation details that shouldn't be exposed to client code, so the usual practice is to delegate to a private helper method. Alternatively, the recursive computation can be defined inline!

Functions.fix() returns the fixed point of a unary operator on functions, enabling recursive lambda expressions (i.e., anonymous recursion). Lambda expressions by definition are unnamed, making explicit recursion impossible. The trick here is to abstract the recursive call by taking the function itself as an argument and letting fix() tie the knot. For example:

Function<Integer, Integer> factorial = fix(f -> n -> n == 0 ? 1 : n * f.apply(n - 1));

And fix() is implemented like this:

static <T, R> Function<T, R> fix(UnaryOperator<Function<T, R>> function) {
    return function.apply(argument -> fix(function).apply(argument));
}

This is almost, but not quite, the fabled Y combinator. Technically, it's not a combinator because it uses explicit recursion. But it doesn't need to be a combinator, it only needs to produce fixed points. And it does!

This also works for trampolined functions and curried functions of arbitrarily many arguments. Trampoline.evaluate() isn't just an instance method, it's also overloaded as an all-in-one static helper method that accepts a trampolined recursive lambda expression and an appropriate number of arguments, fixes the lambda expression, applies it to the arguments, evaluates the resulting trampoline, and returns the unwrapped value. And to complement this approach, the static factory method Trampoline.call() is overloaded to accept curried functions and matching arguments. For example:

static int length(Sequence<?> sequence) {
    return Trampoline.evaluate(
        sequence,
        0,
        f -> seq -> n -> seq.match(
            (head, tail) -> call(f, tail, n + 1),
            () -> terminate(n)
        )
    );
}

That's a stack-safe tail-recursive local helper function taking full advantage of type inference and pattern matching! You might just forget that you're writing Java!

Getting Started

Gradle

Add JitPack to your root build.gradle at the end of the repositories:

allprojects {
    repositories {
        // ...
        maven { url 'https://jitpack.io' }
    }
}

And add the dependency:

dependencies {
    implementation 'io.github.gdejohn:procrastination:0.2.0'
}

Maven

Add the JitPack repository to your pom.xml:

<repositories>
    <repository>
        <id>jitpack.io</id>
        <url>https://jitpack.io</url>
    </repository>
</repositories>

And add the dependency:

<dependency>
    <groupId>io.github.gdejohn</groupId>
    <artifactId>procrastination</artifactId>
    <version>0.2.0</version>
</dependency>

See instructions for other build tools at JitPack. To build from the latest commit, replace the version string for the dependency with master-SNAPSHOT. See the releases for links to the Javadocs for particular versions.

jshell

The included jshell script procrastination.jsh makes it easy to play around with this library, assuming JDK 11 and a recent version of Maven are installed and present on your PATH. Just clone the repository, and from the root directory run mvn compile and jshell procrastination.jsh. The script adds the module to the jshell environment and imports all of the types and static members.

Versioning

This project uses semantic versioning. Check the releases for the available versions.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK