August 2012 Archives

Ocaml, Unicode, and Hashtables

So, Ocaml does not support Unicode out of the box. The “string” type is an 8 bit byte and that’s that. Some find this to be a major failing of the language in general and it is a pain in the ass. However, there is a unicode library for ocaml Camomile which fills the gap.

In the project that I have been working on, I had to read in a Unicode file into Ocaml and create a “seen” hash. Just as you would do in perl normally. However, because Ocaml doesn’t support Unicode natively, you cannot use the generic Hashtable type “(‘a, ‘b) t”, which stands for an arbitrary type for a key (the first ‘a) and an arbitrary type for the value (the second ‘b). The key value types will be filled in by type inference as you use the Hashtable based on what you do with it. This won’t work because the generic Hashtable depends on an equal function that will not conform to the Unicode standard compare.

All is not lost, however! This is where one of the most powerful features of Ocaml comes into its own: the functor. Modules in Ocaml can be parametrized in such a way as the user can redefine a module to meet his/her needs. For Hashtables one can parametrize the Hash on the key value. For instance:

module UTF8Hash = Hashtbl.Make(
  struct
    type t = Camomile.UTF8.t
    let equal a b =
      if (Camomile.UTF8.compare a b) = 0 then
        true
      else
        false
     let hash = Hashtbl.hash
 end
)

Using the Hashtable Make functor, I set the key type to UTF8 then set the equal function to something that makes sense for UTF8 strings. In the end, I left the hash function itself alone as I thought it would probably do the right thing and I didn’t want to get into it so I just defined it as itself.

Doing this creates a Hashtable type of ‘a UTF8Hash.t. The ‘a is now the type of the value as you already know the type of the key for the hash. In addition, doing this, you can create a hashtable that has an arbitrarily complex key type. As long as you can write an equals function, you should be fine.

Ocaml and Hashtables

Coming from a Perl background and seeing the Ocaml Hashtbl documentation for the first time, it can be really confusing. Especially since there is no keys, values, and each functions in the documentation. So, first of all while Ocaml doesn’t give you these functions out of the box and I would argue that it should, you can write them fairly easily yourself using the fold function.

let keys htbl =
  Hashtbl.fold (fun k _ accum -> k::accum) htbl []

As you can see above the fold function gets three params, the first two are the key and value. The third is the “accumulator” or in this case just a list where you will append the k variable. The underscore means “anything” and tells the compiler that you don’t really care about the binding and it is not then necessary.

The values function works similarly.

let values htbl = 
  Hashtbl.fold (fun _ v accum -> v::accum) htbl []

The each function will be slightly different in that we need both k and v and the accum parameters.

let each htbl =
  Hashtbl.fold (fun k v accum -> (k, v)::accum) []

This creates a list of tuples of the key and value. You can then match against it and get the key and value out.

About cyocum

user-pic Celticist, Computer Scientist, Nerd, sometimes a poet…