October 2012 Archives

More Ocaml and Unicode

I was wondering how you would do string encoding translation in Camomile (the Ocaml Unicode library). Why would you want to do this? Well, for instance, if you get a UTF-16 encoded string and you need to use PCRE on it, which only takes UTF-8 input. This means that you need to translate your string downward so you can match against it.

As Camomile uses functors extensively, you need to know how they work so you can follow the code example below. However, it is fairly straight forward once you get your head around it (it took me a while to figure this out which is why I am putting this here).

open CamomileLibraryDefault
open Camomile

module UTF8Decode = Camomile.CharEncoding.Make(UTF8)

let _ =
  try
    let changed = UTF8Decode.decode Camomile.CharEncoding.utf8 "神奈川大学" in 
      UTF8.validate changed;
      print_endline "yes!";
      exit 0
  with 
      | UTF8.Malformed_code ->
          print_endline "no!";
          exit 1

About cyocum

user-pic Celticist, Computer Scientist, Nerd, sometimes a poet…