Code Commit (permanently in beta)

Monads Are Not Metaphors

27
Dec
2010

This article is also available in Japanese.

I am about to break a promise. Almost three years ago, I promised myself that I would never write an article about monads. There are too many such articles already; so many, in fact, that people are often confused by the sheer proliferation. Everyone seems to have a different take on the subject, meaning that those attempting to learn the concept for the first time are stuck trying to reason out the commonalities between burritos, space suits, elephants and desert Bedouins.

I’m not going to add to this menagerie of confusing analogies. The fact is that none of these parallels are entirely accurate. None of them convey the whole picture, and some of them are blatantly misleading in important respects. You will never come to understand monads by pondering Mexican food and the Final Frontier. The only way to understand monads is to see them for what they are: a mathematical construct.

Math (or not)

Here’s the thing about monads which is hard to grasp: monads are a pattern, not a specific type. Monads are a shape, they are an abstract interface (not in the Java sense) more than they are a concrete data structure. As a result, any example-driven tutorial is doomed to incompleteness and failure. The only way to really understand is to take a step back and look at what monads mean in the abstract rather than the concrete. Take a look at the following Ruby snippet:

def foo(bar)
  puts bar
  bar.size
end

Just as a quick Ruby refresher, we can rewrite this code in the following way:

def foo(bar)
  puts bar; bar.size
end

Ruby has this neat convention (which is shared by most modern languages) which causes the final expression in a method to be turned into the implicit return statement. Thus, the foo method will take a parameter, print it to standard out and then return its size. Fairly simple, right?

Here’s the puzzler: what is the semicolon (;) doing? It’s tempting to say that it’s just a separator, but theoretically speaking, there’s something much more interesting going on here. Let’s switch to Scala and add some Christmas trimmings:

def foo(bar: String) = {
  ({ () => println(bar) })()
  ({ () => bar.length })()
}

Just in case you’re unfamiliar with Scala, I’d like to make it clear that we are not required to enclose every statement inside its own lambda (anonymous function). I’m just doing that to make a point.

This function does exactly the same thing as the Ruby version. Well, the parameter is a bit more constrained since we require a String rather than accepting anything defines size, but moving past that… The major difference from what we had previously is that each statement is wrapped inside its own anonymous function, which we immediately call. We can again play the same semicolon trick that we used in Ruby. However, because the statements are actually functions, we can go a step further:

def foo(bar: String) = {
  ({ () => println(bar) } andThen { () => bar.length })()
}

(note: the andThen method isn’t defined for functions of 0-arity, but we’re going to pretend that it is and that it works the same as it does for functions of one argument. If it makes you feel better, you can pretend that these are both one-argument functions taking Unit as a parameter, the theoretical implications are the same, it just requires more syntax)

Notice that we haven’t actually used the semicolon (although we could have). Instead, we’re combining two functions together and invoking them at the very end. The semantics we’re using to do the combination are such that the first function will be evaluated, then its result (()) discarded and the second function evaluated with its result returned. For those following along at home, we could easily define andThen in the following way:

def funcSyntax[A](f1: () => A) = new {
  def andThen[B](f2: () => B) = f1(); f2()
}

In a way, we have defined a method which literally encapsulates the effect of the semicolon “operator”, allowing us to apply it directly to functions, rather than dealing with it indirectly at the level of statements. That’s kind of a cool thought, but the important point is that we are first executing the first function, discarding its result and then executing the second function, returning its result.

It should be clear that we could extend this to any number of functions. For example:

def foo(bar: String) = {
  ({ () => println("Executing foo") } andThen
   { () => println(bar) } andThen
   { () => bar.length })()
}

Still with me? Congratulations, you’ve seen your first monad.

You Could Have Invented Monads! (and maybe you already have)

This certainly isn’t a monad in the traditional sense, but if we worked at it, we could show that the monadic axioms do hold. The significant point here is what this monad is doing: combining one thing together with another in sequence. In fact, this is what all monads do, deep down. You start out with Thing One, and you have a function which will (given One) will give you Thing Two. Monads let you combine Thing One and your function together, producing a final resultant Thing. Let’s look at some more code:

case class Thing[+A](value: A)

This is about the simplest container imaginable (in fact, it is precisely the simplest container imaginable, but that’s not relevant now). We can wrap up values inside of Thing, but that’s about it:

val a = Thing(1)
val b = Thing(2)

Now, let’s switch into design mode for a moment. Imagine that we find ourselves writing a lot of code which looks like this:

def foo(i: Int) = Thing(i + 1)
 
val a = Thing(1)
val b = foo(a.value)        // => Thing(2)

We’re starting with a Thing, and then we’re using the value inside of that Thing to call a function which gives us a new Thing. If you think about it, this is actually a very common pattern. We have a value, and then we use that value to compute a new value. Mathematically, this is pretty much the same as the following:

def foo(i: Int) = i + 1
 
val a = 1
val b = foo(a)              // => 2

The only difference between these is that the first version wraps everything in Thing, while the second version is using “bare” values.

Now, let’s extend our imagine just a bit and assume that we have a good reason for wrapping everything inside of Thing. There could of course be any number of reasons for this, but basically it boils down to the notion that Thing might have some extra logic which does interesting things with its value. Here’s the question: can we come up with a nicer way of going from a to b? Basically, we want to encapsulate this pattern as a more general tool.

What we want is a function which pulls the value out of Thing and then calls another function with that value, returning the result of that function call (which will be a new Thing). Since we’re good object-oriented programmers, we will define this as a method on class Thing:

case class Thing[+A](value: A) {
  def bind[B](f: A => Thing[B]) = f(value)
}

So if we have a Thing, we can pull its value out and use it to compute a new Thing, all in one convenient step:

def foo(i: Int) = Thing(i + 1)
 
val a = Thing(1)
val b = a bind foo          // => Thing(2)

Notice that this is a lot cleaner that our original version, while still performing exactly the same function. Thing is a monad.

The Monad Pattern

Any time you start with something which you pull apart and use to compute a new something of that same type, you have a monad. It’s really as simple as that. If it sounds like I’m describing almost all of your code, then good, that means you’re starting to catch on. Monads are everywhere. And by “everywhere”, I do mean everywhere.

To understand why this is, let’s look at what it is that makes Thing a monad:

val a = Thing(1)

The first thing is that I can wrap up a value inside of a new Thing. Object-oriented developers might call this a “constructor”. Monads call it “the unit function”. Haskell calls it “return” (maybe we shouldn’t try to figure out that one just yet). Whatever you call it though, it comes to the same thing. We have a function of type A => Thing; a function which takes some value and wraps it up inside a new Thing.

a bind { i => Thing(i + 1) }

We also have this fancy bind function, which digs inside our Thing and allows a function which we supply to use that value to create a new Thing. Scala calls this function “flatMap“. Haskell calls it “>>=“. Again, the name doesn’t matter. What’s interesting here is the fact that bind is how you combine two things together in sequence. We start with one thing and use its value to compute a new thing.

It’s as simple as that! If you’re like me, then you’re likely asking the following question: if it’s so simple, what’s all the fuss about? Why not just call this the “using one thing to compute another” pattern? Well, for one thing, that’s far too verbose. For another, monads were first defined by mathematicians, and mathematicians love to name things. Mathematics is all about finding patterns, and it’s hard to find patterns if you can’t affix labels once you’ve found them.

More Examples

I said that monads are everywhere (they are!), but we’ve only looked at two examples so far. Let’s see a few more.

Option

This might be the most famous monad of all, quite possibly because it’s one of the easiest to understand and by far the easiest to motivate. Consider the following code:

def firstName(id: Int): String = ...    // fetch from database
def lastName(id: Int): String = ...
 
def fullName(id: Int): String = {
  val fname = firstName(id)
  if (fname != null) {
    val lname = lastName(id)
    if (lname != null)
      fname + " " + lname
    else
      null
  } else {
    null
  }
}

Here again, we have a fairly common pattern. We have two functions (firstName and lastName) which are responsible for producing some data which may or may not be available. If the data is available, then it is returned. Otherwise, the result of these functions will be null. We then use these functions to do something interest (in this case, compute a full name). Unfortunately, the fact that firstName and lastName may or may not produce a useful value needs to be handled explicitly with a set of nested ifs.

At first blush, it seems this is the best that we can do. However, if you look very closely, you can find the monad pattern buried in this code. It’s a little more complicated than last time, but it’s still there. Let’s try wrapping everything in Thing to make it clear:

def firstName(id: Int): Thing[String] = ...    // fetch from database
def lastName(id: Int): Thing[String] = ...
 
def fullName(id: Int): Thing[String] = {
  firstName(id) bind { fname =>
    if (fname != null) {
      lastName(id) bind { lname =>
        if (lname != null)
          Thing(fname + " " + lname)
        else
          Thing(null)
      }
    } else {
      Thing(null)
    }
  }
}

See it now? As I said, monads are everywhere. Here’s the really useful bit though: every time we bind, the very first thing we do inside the function is test the value to see if it is null, why not move that logic into bind? Of course, we can’t really do that without changing Thing into something different, so we will define a new monad called Option:

sealed trait Option[+A] {
  def bind[B](f: A => Option[B]): Option[B]
}
 
case class Some[+A](value: A) extends Option[A] {
  def bind[B](f: A => Option[B]) = f(value)
}
 
case object None extends Option[Nothing] {
  def bind[B](f: Nothing => Option[B]) = None
}

If you block out everything except Some, this looks a lot like our old friend, Thing. The main difference is that Option has two different instantiations: Some, which contains a value, and None, which doesn’t contain a value. Think of None as being just an easier way of writing Thing(null).

What’s interesting is that Some and None need to have two different definitions of bind. The definition of bind in Some looks a lot like the definition in Thing, which makes sense as Some and Thing are almost identical. However, None defines bind to always return None, ignoring the specified function. How does this help us? Well, let’s return to our fullName example:

def firstName(id: Int): Option[String] = ...    // fetch from database
def lastName(id: Int): Option[String] = ...
 
def fullName(id: Int): Option[String] = {
  firstName(id) bind { fname =>
    lastName(id) bind { lname =>
      Some(fname + " " + lname)
    }
  }
}

All of those nasty if statements have disappeared. This works because firstName and lastName now return None when they fail to fetch the database record, rather than Thing(null). Of course, if we try to bind on None, the result will always be None. Thus, the fullName function returns the combination of firstName and lastName inside of a Some instance only if neither firstName nor lastName return None. If either one returns None, then the result of the whole thing will be None.

For those keeping score at home, we have “accidentally” stumbled upon Groovy’s safe-dereference operator (?.), Raganwald’s andand for Ruby and many, many more. See? Monads are everywhere.

IO

Anyone trying to understand monads will inevitably run into Haskell’s IO monad, and the results are almost always the same: bewilderment, confusion, anger, and ultimately Perl. The fact is that IO is a rather odd monad. It is fundamentally in the same category as Thing and Option, but it solves a very different problem.

Here’s the deal: Haskell doesn’t allow side-effects. At all. Functions take parameters and return values; you can’t just “change” something outside the function. As an example of what this means, let’s return to our earlier Ruby code:

def foo(bar)
  puts bar
  bar.size
end

This function takes a value, calls its size method and returns the result. However, it also changes the standard output stream. This is pretty much the same as if we had a global array hanging around which we just mutated in-place:

STDOUT = []
 
def foo(bar)
  STDOUT += [bar]
  bar.size
end

Haskell doesn’t have variables of any sort (imagine if Scala didn’t have var, or if every variable in Java were final). Since we don’t have any variables, we can’t change anything in-place. Since we can’t change anything in-place, there’s no way we can define a puts function. At least, not the puts we’re used to.

Let’s switch back to Scala. Our goal here is to define a println function which doesn’t rely on any sort of mutable state (ignoring for the moment the fact that the standard output stream is always going to be mutable, since there’s no way we can “change” the user’s screen by cloning the physical display). One thing we could do is wrap up our standard output stream as a Vector which we carry along with our functions:

def foo(bar: String, stdout: Vector[String]) = {
  val stdout2 = println(bar, stdout)
  (bar.length, stdout2)
}
 
def println(str: String, stdout: Vector[String]) = stdout + str

Theoretically, we could write all of our println-enabled functions in this way, passing in the current stdout and receiving the new state as a result. At the end of the day, the entire program would produce a result in addition to the final state of stdout, which could be printed to the screen by the language runtime.

This works (at least for println), but it’s ridiculously ugly. I would hate it if I had to write code like this, and early Haskell adopters felt exactly the same way. Unfortunately, things only get worse from there. Our trick with Vector[String] may work for the standard output stream, but what about standard in? At first blush, it seems as if the readLine function wouldn’t be as bad as println, after all, we aren’t changing anything! Unfortunately, something is clearly being changed at some level, otherwise repeated calls to readLine would yield the same result (clearly not the case).

Graphics updates, networking, the list goes on. It turns out that any useful program is going to need to have side-effects, otherwise there’s no way for all of that usefulness to get outside the program where we can see it. So, Haskell’s designers (more specifically, Phillip Wadler) needed to come up with a way to solve this problem not only for standard out, but for all side-effects.

The solution is actually quite simple. To solve the standard out problem with println, we just passed a Vector[String] around, “modifying” it by returning the new state along with our regular return value. Here’s the inspiration: what if we did that for the entire universe? Instead of passing around just a plain Vector[String], we pass around Universe:

def foo(bar: String, everything: Universe) = {
  val everything2 = println(bar, everything)
  (bar.length, everything2)
}
 
def println(str: String, everything: Universe) = everything.println(str)

As long as the language runtime is able to somehow give us an instance of Universe which behaves the way we would expect, then this code will work just fine. Obviously, the runtime can’t really package up the entire cosmos and allow us to get new versions of it, but it can cheat a bit and pretend to give us the entire universe. The language runtime is allowed to implement the println function on the Universe object in whatever way it deems best (hopefully by actually appending to standard out). Thus, we let the runtime perform whatever magic is necessary while we remain blissfully ignorant of any and all side-effects.

This solves our problems alright, but it’s horrible in almost every other respect. We have this manual threading of the Universe, which is both verbose and painful. Even worse, this sort of thing is very error prone (i.e. what happens if we “modify” the universe and then go back and change an older version of it? do we get two, parallel universes?). The heart of our problem now is that we are using the old version of the universe to compute a new version of the universe, and we’re doing that manually. We’re taking something (in this case, the universe) and using its value to compute a new something (the new version of the universe). Sound familiar?

Phillip Wadler’s inspiration was to take advantage of the monad pattern to solve this problem. The result is the IO monad:

def foo(bar: String): IO[Int] = {
  println(bar) bind { _ => IO(bar.length) }
}
 
def println(str: String): IO[Unit] = {
  // TODO insert magic incantations
}

Of course, we can’t actually implement println, since this fake language isn’t allowed to involve itself in side-effects. However, the language runtime could provide a native (read: cheat code) implementation of println which performs its side-effects and then conjures a new version of IO (i.e. the modified universe).

The only catch with this design is that we can never get anything out of IO. Once you start down the dark path, forever will it dominate your destiny. The reason for this is one of language purity. Haskell wants to disallow side-effects, but if it were to allow us to pull values out of IO, then we could very easily bypass its safe-guards:

def readLine(): IO[String] = {
  // TODO insert magic incantations
}
 
def fakeReadLine(str: String): String = {
  val back: IO[String] = readLine()
  back.get      // whew!  doesn't work
}

As you can see, if we could pull values out of IO, then the whole exercise would become a waste of time, since it would be trivially easy to hide side-effects within wrapper functions.

Of course, none of this is particularly relevant to Scala or Ruby, much less Java! Scala doesn’t restrict side-effects. It allows you to call println whenever you feel like it, and it provides a way of declaring mutable variables (var). In a sense, Scala is hiding a hypothetical IO monad. It’s as if all Scala code is implicitly inside of IO, and thus implicitly threading the state of the universe from one moment to the next. In light of this fact, why should we care at all about how IO works? Mostly because it’s a monad which is very different from Thing and Option. I briefly considered using State, but that’s too much ancillary complexity when we’re trying to focus on the core idea of using the value from one thing to compute another thing.

Now What?

So, we’ve identified the monad pattern. We can spot it in code and employ it in an impressive range of situations, but why are we bothering with all the ceremony? If we’re already using monads all over the place without realizing it (e.g. semicolon), then why do we have to bother futzing with all of this nomenclature? Simply put: why do we care that Option is a monad so long as it just “does the right thing”?

Well, the first answer to that lies in the nature of mathematics, and by extension, computer programming. As I said, before, mathematics is all about identifying patterns (well, you also play with those patterns to generate larger, more intricate patterns, but that’s really just a means to an end). Once you identify a pattern, you name it. That’s just what mathematicians do. Imagine if Newton hadn’t named the “derivative”, and he simply insisted on calling it “an expression for the line tangent to a given line relative to a particular valu