Skip to content

Understanding Bound

Anupam Jain edited this page Mar 20, 2016 · 1 revision

Understanding Bound

Edward Kmett made a comment on Reddit which was immensely helpful -

This is the entire text of the comment as is. I just converted it to github flavored markdown and added it to the wiki to make it more accessible to users of this library. Everything beyond this paragraph is in Edward's own words. I even took the section headings from the comment text. :)



Bound is just about how to deal with names in your syntax tree.

The jargon arises because this can be a deceptively complicated problem and it has many potential solutions. I'll try to steer clear of jargon:

When you write an interpreter, and put together a syntax tree you 'bind' or abstract names for function arguments, and instantiate them when you call the function.

Monads are all about substitution, so in bound I literally use the Monad on your syntax tree for just that. Using (>>=) substitutes in terms for free variables in the tree. A free variable is just that. The expression a + b has both a and b as free variables in it. You could replace a with (6 * 8) and still have a reasonable expression.

So with bound, you give me an expression type like

data Exp a = Var a | App (Exp a) (Exp a) | Lam (Scope () Exp a)

and I give you a monad transformer Scope b that you can use to capture some variables so that they won't be replaced by (>>=).

You need this because when you have something like (\f x -> f x) those names are captured by the lambdas in question.

In (\f y -> f x y), both f and y are bound, and the lambda acts as a 'binder'. The x remains a free variable and could be captured by a surrounding lambda, or supplied by an environment of some sort. If you were to take the program (\x -> (\x -> x) x) and change the name of the outer x it'd become (\y -> (\x -> x) y) The inner 'x' is bound by something closer to it.

But what matters here is that bound variables are captured by lambdas, let expressions, foralls, or whatever other kind of binder you are using in your code. They arise in all sorts of syntactic constructs.

Getting back to bound and Scope b:

One way to think about Scope b f a is as if it was EitherT b f a or f (Either b a) such that the monad only does the substitution on the 'a' variables, leaving the b variables bound by a given scope alone.

newtype Scope b f a = Scope { runScope :: f (Either b a) } -- simplified

You can build a scope by abstracting:

abstract :: Monad m => (a -> Maybe b) -> f a -> Scope b f a

and you can instantiate them to tear down a scope, e.g. for applying a lambda to a value:

instantiate :: Monad m => (b -> f a) -> Scope b f a -> f a

Note: these two operations work on any monad you give me, and factor out name capture from what you need to worry about.

There is one more big part to bound, a combinator (>>>=) for binding into scopes, patterns, case alternatives, etc.

instance Monad Exp where
   return = Var
   Var a >>= f = f a
   App l r >>= f = App (l >>= f) (r >>= f)
   Lam b >>= f = Lam (b >>>= f)

Note the (>>>=) in the Lam case! thats the only thing you need to change in an otherwise completely obvious monad to support name capture.

You have to write a few boilerplate instances on top of it, but thats pretty much it. The API for bound is around 6 functions and it can be hand implemented at the top of a source file if you don't want to incur an extra entirely haskell 98 dependency. One of my coworkers is fond of doing so when he works in Agda.

That said, the library does take care of some extra tricky details. It deals with the 'equality of names' for you, makes that (>>=) much faster than the naive EitherT version I described here, etc.

Putting the jargon back in

The EitherT straw-man version would give you a "locally nameless de Bruijn indexed" syntax tree. Scope is more efficient than that, because it can figure out that it doesn't need to walk whole branches of the tree to slide a 'Right' constructor for the Either into it, by basically letting it wrap the branch itself. This lets it also know that it can't find any 'Left's inside the branch, so it can avoid skimming it later when it comes time for instantiation. It does this by replacing

newtype Scope b f a = Scope { runScope :: f (Either b a) } -- simplified

from above with

newtype Scope b f a = Scope { runScope :: f (Either b (f a)) }

This actually speeds up de Bruijn indexing an alarming amount. This was first discovered by Bird and Paterson 13 years ago, but the pattern wasn't abstracted out into something reusable until bound, which took it from needing higher rank types and a fairly complicated one-off encoding for each would-be monad to a simple monad transformer you can reuse.