Writing efficient free variable traversals
source link: https://www.tuicool.com/articles/VRraQj3
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
bgamari - 2019-07-28
Imagine that I show you a fragment of a Haskell program:
let x = y + 1 in x
and pose the question: “what are the variables in this program?” You would likely respond x
and y
. Now I ask you: what is the difference between those two variables?
One answer is that x
is bound
by the fragment whereas y
comes from the fragment’s surrounding context. Variables like y
are known as free variables
.
When compiling a language like Haskell you often find yourself needing to enumerate the free variables of types and expressions. Consequently, GHC’s free variable traversals are some of the most heavily used pieces of code in the compiler and we try hard to ensure that they are efficient.
How does one write an efficient free variable traversal? This is a question we will try to answer in this post.
A naïve solution
For the sake of concreteness, let’s consider a toy expression language:
data Var = Var Int data Expr = EVar Var -- a variable expression | ELam Var Expr -- lambda abstraction | EApp Expr Expr -- function application | ELit Integer -- an integer literal
Since we are talking about sets
of variables, we need a way to represent such things (in GHC this is implemented using containers
’ IntMap
, but anything with this interface will do):
data VarSet instance Monoid VarSet instance Semigroup VarSet insertVarSet :: Var -> VarSet -> VarSet delVarSet :: Var -> VarSet -> VarSet unitVarSet :: Var -> VarSet
With this groundwork in place, our first attempt at writing a free variable traversal for such a language might look like this:
-- A naïve free variable traversal. freeVars :: Expr -> VarSet freeVars (EVar v) = unitVarSet v freeVars (ELam v e) = delVarSet (freeVars e) freeVars (EApp fun arg) = freeVars fun <> freeVars arg freeVars (ELit _) = mempty
Loading this into GHCi we see that this indeed gives us the right answer for a simple example program:
>>> f:x:y:z:_ = map (EVar . Var) [0..] -- \x -> (\y -> f x) (f z) >>> e = ELam x ((ELam y (EApp f x)) `EApp` (EApp f z)) >>> freeVars e VarSet (fromList [EVar (Var 0), EVar (Var 3)])
However, this traversal is far from efficient:
-
We build
VarSet
s containing variables for which we have already encountered a binding site. For instance, when computingfreeVars ELam y (EApp x y)
we will build the full free variable set of the subexpressionEApp x y
, then perform yet another allocation to remove the variable that we bind (e.g.y
). It would be better to avoid addingy
to our result in the first place. -
the
EApp
case is not tail recursive, resulting in unnecessary stack and heap allocation. To see this we look at the (cleaned-up) STG produced for the RHS offreeVar
‘s’EApp
case (compiled with-O0
), recalling that in STGlet
bindings correspond directly to thunk allocation:freeVars = \expr -> case expr of EApp fun arg -> let x = freeVars fun -- allocate a thunk for the free vars of fun y = freeVars arg -- allocate a thunk for the free vars of arg in (<>) x y -- call (<>) with the two thunks ...
Note that GHC, when invoked with
-O1
, will eliminate these thunk allocations as itknows that(<>)
is strict in its arguments. This will produce slight less allocation-heavy STG (noting thatcase
in STG corresponds to forcing an expression):freeVars = \expr -> case expr of EApp fun arg -> case freeVars fun of fun_fvs -> case freeVars arg of arg_fvs -> (<>) fun_fvs arg_fvs ...
Now we have two nested
case
analyses, each of which will incur a stack frame push and pop. This cost likely won’t be terrible, but we can nevertheless do better.
Improving upon our attempt
We can address the issues noted above with a two-pronged plan of attack:
-
Carry a set of bound variables downward with us as we traverse the expression tree. Consequently, when we encounter a
EVar
we can first ask whether we have seen a binding site for that variable before committing to allocate a newVarSet
. -
Move to an accumulator style , refactoring
freeVars
to take a partially-built set of free-variables we have seen thusfar as an argument. This allows us to definefreeVars
tail recursively.
An implementation of these optimizations might look like:
freeVars' :: VarSet -- ^ bound variable set -> Expr -- ^ the expresson we want the free variables of -> VarSet -- ^ the accumulator -> VarSet -- ^ the final free variable set freeVars' boundVars (EVar v) acc | memberVarSet v boundVars = acc -- we have seen the binding site for the variable, not free | otherwise = insertVarSet v acc freeVars' boundVars (ELam v e) acc = let -- we are binding 'v' so add it to our bound variable set boundVars' = insertVarSet v boundVars in freeVars' boundVars' e acc freeVars' boundVars (EApp fun arg) acc = freeVars' boundVars fun $ freeVars' boundVars arg acc freeVars' boundVars (ELit _) acc = acc freeVars :: Expr -> VarSet freeVars e = freeVars' mempty e mempty
While a bit more verbose this gives much nicer performance characteristics:
-
we have eliminated the allocation in the common case that we encounter a variable bound in the fragment we are traversing
-
computing the
EApp
case now allocates only one return frame rather than two:case freeVars' boundVars arg acc of arg_vars -> freeVars' boundVars fun acc
Aside: Avoiding redundant inserts
An additional optimization that can also be useful is to check whether we have already added a variable to our accumulator before inserting:
freeVars' boundVars (EVar v) acc | memberVarSet v boundVars = acc -- we have seen the binding site for the variable, not free | memberVarSet v acc = acc -- the variable is free but has already been added to the accumulator | otherwise = insertVarSet v acc
In VarSet
representations like the IntMap
used by GHC this can alleviate pressure on the garbage collector. Given that GC time is often considerable in a heavily-allocating program like GHC this is likely a worthwhile optimization (although I have not yet measured its effect).
However, as it is not critical to the current discussion, we will ignore it for the remainder of the post.
Cleaning up the implementation
It is nice that this new implementation is fast, but wouldn’t it be great if we could recover the readability
of our naïve approach? To do so we might be tempted to define a type and some combinators to capture the patterns seen in freeVars'
above:
-- | A computation to build a free variable set. newtype FV = FV { runFV :: VarSet -- bound variable set -> VarSet -- the accumulator -> VarSet -- the result } instance Monoid FV where -- The empty FV just passes the accumulator along unperturbed mempty = FV $ \_ acc -> acc instance Semigroup FV where -- Unioning two FVs passes the accumulator produced by one to the other fv1 <> fv2 = FV $ \boundVars acc -> runFV fv1 boundVars (runFV fv2 boundVars acc) -- | Consider a variable to be bound in the given 'FV' computation. bindVar :: Var -> FV -> FV bindVar v fv = FV $ \boundVars acc -> runFV fv (insertVarSet v boundVars) acc -- | Take note of a variable reference. unitFV :: Var -> FV unitFV v = FV $ \boundVars acc -> if memberVarSet v boundVars then acc -- variable is known to be bound, ignore else insertVarSet v acc -- variable is free, insert into accumulator fvToVarSet :: FV -> VarSet fvToVarSet fv = runFV fv mempty mempty
With this groundwork in place freeVars
becomes a simple declarative description of how to walk the Expr
AST:
exprFV :: Expr -> FV exprFV (EVar v) = unitFV v exprFV (ELam v e) = bindVar v (exprFV e) exprFV (EApp fun arg) = exprFV fun <> exprFV arg exprFV (ELit _) = mempty freeVars = fvToVarSet . exprFV
Improving code re-use
GHC has used a close relative of the FV
type seen above for its free variable traversals on its Core ASTs for a few years now. However, over the years the varying needs of different parts of GHC have caused the codebase to sprout several additional traverals:
- free variable collection that doesn’t guarantee determinism across compiler runs (as GHC uses non-deterministic unique numbers to identify variables)
- free variable collection which only returns local variables
- free variable collection which only returns coercion variables (a variety of variable beyond the scope of the present discussion; see the System FC paper for details)
- traversal that only returns a boolean reflecting whether an expression has any free variables
Moreover, in GHC each of these must be implemented on each of the three ASTs which comprise Core programs: Expr
, Type
, and Coercion
.
Needless to say, this results in a significant amount of repetition. Is it possible to abstract over the details of each travesal strategy and share the bare skeleton of exprFV
and
get preserve our hard-fought performance improvements? We would be poor functional programmers if we answered “no”. Let’s see how.
Abstracting the traversal strategy
To improve code re-use we want to separate the AST traversal
(that is, how we walk over the various parts of the AST) from what we call the strategy
used to build the free variable set. To guide our abstraction we can start by looking at the operations used by the exprFV
traversal above:
unitFV bindVar Monoid Semigroup
Following a final tagless style, we write down a typeclass capturing all of these operations (N.B. recall that Semigroup
is implied by Monoid
):
class Monoid a => FreeVarStrategy a where unitFV :: Var -> a bindVar :: Var -> a -> a
We can now define our traversals in terms of this class (note that only the type signature has changed from the exprFV
seen above; how nice!):
exprFV :: FreeVarStrategy a => Expr -> a exprFV (EVar v) = unitFV v exprFV (ELam v e) = bindVar v (exprFV e) exprFV (EApp fun arg) = exprFV fun <> exprFV arg exprFV (ELit _) = mempty
Finally we can write some strategies implementing this interface. To start, the naïve implementation from the beginning of this post can be captured as:
newtype Naive = Naive { runNaive :: VarSet } instance Monoid Naive where mempty = Naive mempty instance Semigroup Naive where Naive a <> Naive b = Naive (a <> b) instance FreeVarStrategy Naive where unitFV = Naive . unitVarSet bindVar v (Naive xs) = Naive (delVarSet v xs)
Our optimized implementation is ported with a bit more work:
newtype FV = FV { runFV :: VarSet -- bound variable set -> VarSet -- the accumulator -> VarSet -- the result } instance Monoid FV where mempty = FV $ \_ acc -> acc instance Semigroup FV where fv1 <> fv2 = FV $ \boundVars acc -> runFV fv1 boundVars (runFV fv2 boundVars acc) instance FreeVarStrategy FV where unitFV v = FV $ \boundVars acc -> if memberVarSet v boundVars then acc else insertVarSet v acc bindVar v fv = FV $ \boundVars acc -> runFV fv (insertVarSet v boundVars) acc fvToVarSet :: FV -> VarSet fvToVarSet fv = runFV fv mempty mempty
Further, we can implement a strategy which determines whether an expression has any free variables without writing any code looking at Expr
(using the same bound-variable set optimized as FV
):
newtype NoFreeVars = NoFreeVars { runNoFreeVars :: VarSet -- bound variable set -> Bool -- True is free variable set is empty } instance Monoid NoFreeVars where mempty = NoFreeVars $ const True instance Semigroup NoFreeVars where NoFreeVars f <> NoFreeVars g = NoFreeVars $ \boundVars -> f boundVars && g boundVars instance FreeVarStrategy NoFreeVars where unitFV v = NoFreeVars $ \boundVars -> memberVarSet v boundVars bindVar tv (NoFreeVars f) = NoFreeVars $ \boundVars -> f $ insertVarSet v boundVars noFreeVars :: NoFreeVars -> Bool noFreeVars (NoFreeVars f) = f mempty
Finally, to guarantee that this abstration carries no cost, we can explicitly ask GHC to specialise our exprFV
traversal for these strategies:
{-# SPECIALISE exprFV :: Expr -> Naive #-} {-# SPECIALISE exprFV :: Expr -> FV #-} {-# SPECIALISE exprFV :: Expr -> NoFreeVars #-}
Looking at -ddump-stg
output we can see that GHC has done a wonderful job in optimising our abstration away.
The problem of mutual recursion
While this has all worked out swimmingly so far, things can go awry with more complex ASTs (such as Core). In particular, ASTs where the traversal is part of a mutually recursive group can break the nice performance characteristics that we have seen thusfar.
To see how, let’s augment our expression AST with some (admittedly rather peculiar) dependent type system.
data Expr = EVar Var Type -- a variable expression and its type | ELam Var Expr -- lambda abstraction | EApp Expr Expr -- function application | ELit Integer -- an integer literal data Type = ... -- these constructors are irrelevant | TExpr Expr
Note here that the point here is not the type system itself but rather the fact that our traversals now must be mutually recursive functions For instance, in GHC Core this mutual recursion happens between the Type
and Coercion
types.
Writing traversals for this AST is straightforward:
exprFV :: FreeVarStrategy a => Expr -> a exprFV (EVar v ty) = unitFV v <> typeFV ty exprFV (ELam v e) = bindVar v (exprFV e) exprFV (EApp fun arg) = exprFV fun <> exprFV arg exprFV (ELit _) = mempty typeFV :: FreeVarStrategy a => Type -> a typeFV (TExpr e) = exprFV e -- omitted: traverse other constructors of Type
However, if we look at the STG generated for NoFreeVars
we see that things have gone terribly wrong (again recalling that in STG let
corresponds to heap allocation):
exprFV :: Expr -> NoFreeVars exprFV = \expr -> case expr of EApp fun arg -> let fun_fvs :: NoFreeVars fun_fvs = exprFV fun in let arg_fvs :: NoFreeVars arg_fvs = exprFV arg in let r :: NoFreeVars r = \boundVars -> case fun_fvs boundVars of False -> False True -> arg_fvs boundVars in r ...
Notice that GHC has now allocated a thunk for each application of exprFV
(and typeFV
, although this is not seen in the above excerpt). As we will see below, this is a result of the fact that GHC’s arity analysis does not attempt to handle mutual recursion (see Note [Arity analysis]
in CoreArity.hs
):
In the case of our untyped Expr
AST the simplifier was able to see that exprFV
was of arity 2 (as we look through casts) and consequently eta-expanded its right-hand side to yield the following Core after inlining (ignoring casts):
exprFV = \expr boundVars -> case expr of EApp fun arg -> case fun_fvs boundVars of False -> False True -> arg_fvs boundVars
By contrast, in the typed AST arity analysis conservatively concludes that exprFV
is arity 1 and consequently this eta expansion does not happen. Consequently (<>)
will be inlined and have its arguments let
-bound (which GHC does to ensure inlining does not compromise sharing).
There are two ways via which this can be remedied:
-
Manually eta expand
exprFV
and the like . This is what GHC’s free variable traversal did for many years. Unfortunately this means giving up on the ability tonewtype
FV
. -
Convince GHC to push
fun_fvs
andarg_fvs
in to the lambda inr
. This is what the rest of this post will be about.
The magical oneShot
primitive
GHC calls the transform of pushing a let
binding down an AST tree floating in
. In general GHC will not float a let
binding into a lambda for good reason: doing so may reduce sharing. For instance, if we have:
main = do let nthPrime :: Integer -> Integer nthPrime = let primes :: [Integer] primes = {- insert favorite method for enumerating primes here -} in \n -> primes !! n print (nthPrime 10) print (nthPrime 100)
We must ensure that primes
is only evaluated once, not once for each application. Consequently GHC will not float a let
binding into a lambda unless it can prove that the lambda is applied at most once (such a lambda is said to be a one-shot
lambda).
In the case of our free variable traversal, we indeed expect each NoFreeVars
value to be applied to precisely one bound-variable set (which, recall, is the argument to the function inside a NoFreeVars
). Happily, GHC gives us the means to communicate this expectation via the GHC.Magic.oneShot
combinator:
oneShot :: (a -> b) -> a -> b oneShot = {- there be dragons here -}
Applying this primitive to a lambda expression will cause GHC to consider that lambda to be one-shot, even if it can’t prove this is the case.
Applying this to the (<>)
definition of NoFreeVars
:
instance Semigroup NoFreeVars where NoFreeVars f <> NoFreeVars g = NoFreeVars $ oneShot $ \boundVars -> -- <=== oneShot here f boundVars && g boundVars
This one-line change allows the simplifier to push the let
bindings seen in the EApp
case of exprFV
into the lambda in r
as we expected, recovering the nice efficient code that we saw in the untyped AST traversal.
We can use this same trick to fix the FV
strategy. However, note that oneShot
only affects the lambda binder that it is immediately applied to. Consequently, in the case of FV
the incantation is a bit more verbose as we must break what was previously a single syntactic lambda into several lambdas, punctuated with oneShot
s:
instance Semigroup FV where fv1 <> fv2 = FV $ oneShot $ \boundVars -> oneShot $ \acc -> runFV fv1 boundVars (runFV fv2 boundVars acc)
Summary
In this post we have examined a few ways of writing free variable travesals in Haskell. We started with a simple yet inefficient implementation and then used some simple tricks to improve the efficiency of the produced code at the expense of verbosity. We then abstracted away this verbosity and in so doing allowed for easy implementation of more specialised traversal strategies such as NoFreeVars
.
Finally, we saw how a short-coming of GHC’s simplifier can cause poor code generation for mutually-recursive traversals and described GHC’s oneShot
primitive. This allowed us to inform GHC of an assumption made by our abstraction, enabling GHC to produce the code we would write by hand from our simple, orthogonal specification of an AST traversal and free variable strategy.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK