June 2013 Archives

ruby vs perl / github cannot use utf8

My little ruby derived vm cannot parse unicode codepoints >0xffff yet.

user sromanov found this little limitation and wanted to file a bugreport about it.
Since my vm is hosted on github and github is written in ruby and ruby has the same problem as my app, it turned out as catch-22.

See http://www.fileformat.info/info/unicode/utf8.htm

With 3byte sequences you can represent max 0xFFFF
with 4byte sequences max 0x10FFFF.

11110xxx 10xxxxxx 10xxxxxx 10xxxxxx (3+6+6+6)=21 10FFFF hex (1,114,111)

In my case \x{2603} was …