Ocaml, Unicode, and Hashtables
So, Ocaml does not support Unicode out of the box. The “string” type is an 8 bit byte and that’s that. Some find this to be a major failing of the language in general and it is a pain in the ass. However, there is a unicode library for ocaml Camomile which fills the gap.
In the project that I have been working on, I had to read in a Unicode file into Ocaml and create a “seen” hash. Just as you would do in perl normally. However, because Ocaml doesn’t support Unicode natively, you cannot use the generic Hashtable type “(‘a, ‘b) t”, which stands for an arbitrary type for a key (the first ‘a) and an arbitrary type for the value (the second ‘b). The key value types will be filled in by type inference as you use the Hashtable based on what you do with it. This won’t work because the generic Hashtable depends on an equal function that will not conform to the Unicode standard compare.
All is not lost, however! This is where one of the most powerful features of Ocaml comes into its own: the functor. Modules in Ocaml can be parametrized in such a way as the user can redefine a module to meet his/her needs. For Hashtables one can parametrize the Hash on the key value. For instance:
module UTF8Hash = Hashtbl.Make( struct type t = Camomile.UTF8.t let equal a b = if (Camomile.UTF8.compare a b) = 0 then true else false let hash = Hashtbl.hash end )
Using the Hashtable Make functor, I set the key type to UTF8 then set the equal function to something that makes sense for UTF8 strings. In the end, I left the hash function itself alone as I thought it would probably do the right thing and I didn’t want to get into it so I just defined it as itself.
Doing this creates a Hashtable type of ‘a UTF8Hash.t. The ‘a is now the type of the value as you already know the type of the key for the hash. In addition, doing this, you can create a hashtable that has an arbitrarily complex key type. As long as you can write an equals function, you should be fine.