Rakudo.js update - NFG / unicode collation and role bug fixes
Rakudo.js update - NFG, unicode collation and more bugfixes
Rakudo.js has been in bugfixing mode recently.
Rakudo.js now uses NFG (Normal Grapheme Form) semantics in some places.
This means some string operations treat strings as sequences of graphemes instead of unicode code points. Graphemes are "user-perceived characters" (See http://unicode.org/reports/tr29/). This isn't done everywhere yet but it allows us to pass a bunch of roast tests.
Because JavaScript doesn't use graphemes underneath in it's string implementation like MoarVM does using NFG semantics can be much more expensive.
As such in low level setting code we often want to use the native javascript semantics when they are good enough.
To make that a choice I added a bunch of NFG aware op variants like (nqp::charsnfg) so we can pay the price only when it's necessary.
I have also implemented the Unicode Collation Algorithm (http://unicode.org/reports/tr10/) which allows us to sort strings in a unicode aware manner.
Javascript has a Intl.Collator functionality buitin (See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Collator) but unfortunately it uses a slightly different semantics so I had to write my own implementation.
Once documented and optimized it should be usable independently of the rakudo.js project (https://www.npmjs.com/package/unicode-collation-algorithm).
I have finally figured a tricky long standing bug with roles which means they now work.
A bunch of other less notable stuff has been fixed.
Now roughly 94% of the roast subset specified in grant deliverables pass so I now plan to focus on the remaining bugs and then move on to performance and usability.
Leave a comment