Writing State Monads in Ocaml

One of my projects for quite a while is to understand Monads. I have been working on this, off and on, for an embarrassing amount of time. Yes, I could have just learned Haskell and that would have done it but I wanted to be a pain in the arse. Also, I had a horrible experience attempting to learn Haskell a long time ago. Thus, I decided as I use Ocaml for most things then I should be able to do it. First, I read this article a few times and this which is really good and helps hugely in my understanding. Anyway, I will take a step by step approach to showing how to write a State Monad for Ocaml.

So, first thing, what is a Monad? If you remember your undergraduate philosophy classes a Monad is an entity that is basically impenetrable but can communicate. Thus, you don’t know what’s in the monad but you can communicate this. With this in mind, a Monad in code is:

module type MONAD =
  sig
    type 'a t
    val bind : 'a t -> ('a -> 'b t) -> 'b t
    val return : 'a -> 'a t
  end

You will notice here that I am declaring a module type and not a module. Basically, I want to build monads out of things. Thus, Monad in OO terms is an abstract base class. So, if we want to have a “kind of” Monad, we have to build it. In any case, the type 'a t is the type system wrapper around a polymorphic value. Basically, the type system wraps the value in a type so that it can be manipulated at the type level rather than at the value level. return is kind of an odd way to put it but it “return”s a Monad out of a value of some kind (as has been pointed out by pozorvlak, mote is a better term for values of that type and is a common mistake in Monad tutorials); we don’t care what kind (or type) the value is. bind is a way for the user to bind a function to a Monad. What you will get is the Monad feeding the value into the function and expecting a new Monad from the function then it will automatically return that Monad.

So, you can create different kinds of Monads to do different things and one of the side-effects is that side-effects are encapsulated within the Monad itself. This means that you can do things which are not “pure” functional programming within a purely functional system (this isn’t strictly true either). As you can see with bind, you will always return a Monad. However, I can hear you saying, “but how do I get values out of a Monad?” Don’t worry you can do that by having a function like this:

val access : 'a t -> 'a

This function is some times called “run” as well. What it does is it asks the Monad for its computation (aka “a thunk”) as the Monad writer can put off actually doing the computation until this stage in the processing. So, with that knowledge, let’s attempt the next stage:

module type STATE =
  sig
     type t
     val empty : t
  end

Again, this is an abstract base class for “State” and it allows you to define your own state using a functor (I should write another thing on functors). Basically, I don’t want to constrain the kinds of StateMonads and just let my users define what kind of state they need. So, bringing this together with the MONAD declaration above:

module type STATE_MONAD = 
   functor(State : STATE) ->
     sig
       include MONAD
       val access : 'a t -> 'a
       val put : State.t -> unit t
       val get : State.t t
     end

Ok, again another declaration with no real code, please be patient all will be revealed. This has access and put and get functions. The access function, actually does the computation of functions that are bound within the Monad. The put and get is something I am still trying to get my head around but it basically allows you to use the StateMonad as a variable store. You can store things in it like a variable and retrieve the information and return it.

So, now on to some Real Code:

module StateMonad : STATE_MONAD =
   functor(State : STATE) ->
     struct
       type state = State.t
       type 'a t = state -> ('a * state)
       let bind m f =
         fun s ->
           match m s with 
              | (x, s') -> f x s'
       let return a = fun s -> (a, s)
       let access m =
           match m State.empty with
             | (x, s) -> x
       let put s =
           fun _ -> ((), s)
       let get =
           fun s -> (s, s)
     end

So, lets break this down. The functor allows the user to set what kind of state they want. I will demonstrate this below. The next bit is type 'a t = state -> ('a * state). This defines the actual type that is wrapped inside the StateMonad. Basicaly it said “This Monad is a function which takes one state and translates that state into a value and the next state”. Thus, when you put a state in the StateMonad via return, it returns just what it says on the tin a function which takes a state and return a tuple of a value and the state. The next function bind binds a function to the Monad and the monad calls the last function it created (in this case the “m” in the code) and matches the output of “m” with the expected tuple which is then passed to the next function in the chain. The access function takes an empty state and that fires the function chain to come to a final state and value then the final value is returned.

Now, the put and get functions. I am not sure I actually understand these as well as I should so I will tell you now that if what I say is wrong, please let me know. So basically put creates a degenerate return function which puts the value in the same kind of wrapper as a function. The get passes back a function which fires immediately and gives you back it’s value. I am completely unsure of how exactly this works but it seems to work in practice. If someone could enlighten me, that would be wonderful.

So, now I will present a trivial example: IntStateMonad.

   module IntStateMonad = StateMonad(
     struct
       type t = int
       let empty = 0
     end
    )

This tells the StateMonad what kind of state the user wants. This basically fills in the functor State from above. I am still slightly unsure if the empty value is really necessary or if there is some other way I can get the underlying function to fire without it. So here is an example:

let _ =
   let blah = IntStateMonad.return 1 in 
   let blah2 = IntStateMonad.bind blah (fun i -> IntStateMonad.return (succ i)) in 
      print_endline (string_of_int (IntStateMonad.access blah2))

So, can we do something more Haskell like in this instance? Yes, with the “perform” notation that is provided by pamonadcustom, which uses camp4 to parse ocaml and do syntax transformations. So we can do this like so:

let return = IntStateMonad.return

let _ =
   let blah = 
        perform with IntStateMonad.bind in 
           a <-- return 1;
           b <-- return (succ a);
           return b
 in 
    print_endline (string_of_int (IntStateMonad.access blah))

So you have to bind the return and bind functions to something that is accessible by the pa_monad syntax. Otherwise, it takes care of interleaving the binds then it returns the resulting Monad which you can run (or access…whatever you prefer to call it).

As for get and put, you can use it in this way:

let _ =
   let blah = 
      perform with IntStateMonad.bind in 
        a <-- return 1;
        b <-- IntStateMonad.put 5;
        c <-- return (succ a);      
        d <-- IntStateMonad.get;
        return (c + d)
 in
    print_endline (string_of_int (IntStateMonad.access blah))

So, there you have it. This is a Monad which keeps track of the state for you and will return the evaluated functions in a type wrapper. This will also get and store state like a variable wrapped inside the bound function calls. All side-effects are explicitly handled inside the Monad and Leibniz gets to smile. Who said philosophy was useless?

UPDATE Added information on motes and clarified some stuff on side-effects in a purely functional programming language.

5 Comments

Hey Chris.

I think this is a great way to learn monads.

There’s probably a fancy way to explain the implementation of “get” and “put”. Maybe it’d help to write “modify”. The purpose of modify is to mutate the state with an update function.

modify : ('a -> 'a) -> unit m
modify f s = (f s, ())

You could then define put as

put x s = modify (fun _ -> x)

So here, your update function is a constant function.

I’m not sure if the presence of “()” in the definition is distracting. You could write a little wrapper (I don’t know the Camlp4 for this, because I haven’t got it working with HOL Light yet):

modifyget f =
perform ...
() <- modify f
x <- get
return x

which you could implement directly with modifyandget s = let y = f s in (y,y)

But you probably wouldn’t want to do it like this, because you want to say that “modify” and “put” do not return a value: they’re just called for “side-effects.” You know this because the type is m ().

Haskell actually spots this and emits style warnings about it. You want that unit there so that the type system knows you haven’t ignored something important. Think about how Ocaml likes to nag you to use “ignore” sometimes, in case you’ve accidentally forgotten something.

Don’t know if that clarifies anything or you already had this down. I just find a lot of the time my only “understanding” is being able to write translations like this.

I’m not sure you want empty by the way, since you probably won’t have an obvious candidate for this value across all types. If you want the state to be defaulted, you should probably wrap it in an option.

By the way, here’s a cool line by Alan Kay on Smalltalk:

“That the Ideas are themselves manifestations (of the Idea-Idea) and that the Idea-Idea is a-kind-of Manifestation-Idea–which is a-kind-of itself, so that the system is completely self-describing– would have been appreciated by Plato as an extremely practical joke” — Alan Kay

In Haskell, it’s just a parameter to “runState”.

I think what you’re trying to do is avoid the need for this parameter and just have a default value. But I think in that case you want to use option, and empty is then just None (which is literally empty).

Basically, I’d separate out the idea of state from the idea of state + default values. If you start messing with transformers, you can actually express this compositionally. Then in the composed module, you’ll probably define some runDefault function based on

val default : 'a -> 'a option -> 'a

In the case of int above, you’re defaulting to 0.

Or, we could go more abstract (but we might just be getting silly now), and say that this defaulting is based on structure with a zero element (of which option is just one example, and int is another), and we always take the 0 as the default. This is what you end up doing with the writer monad, which you can see in my library:

https://github.com/Chattered/ocaml-monad

The dependency on the structure with the zero gets captured in a functor.

Ack. Of course. “Initial state” is a better term.

So here’s a question: what happens when we want to allow the state to be uninitialised? In most languages, the answer is obvious: we just set the variable to null.

But, of course, in Ocaml, we don’t like null. So instead, if we want a state which can be uninitialised, we want an option.

Leave a comment

About cyocum

user-pic Celticist, Computer Scientist, Nerd, sometimes a poet…