RFC: Unicode 6.2 changes

Unicode is getting ready to release their next version, 6.2 (likely in September). It includes only a single new character, for the new Turkish currency symbol. But there are changes to the properties of existing characters, and some of these may be of concern to Perl programmers.

A comment period on the changes has just started, closing in July. Unicode calls these PRI's, or Public Review Issues. This is a link to the page describing the changes and procedures for commenting.

The issues I think are of most interest to Perlers are the proposed changes of the Unicode General Category for a number of ASCII characters. Follow this link for a list of them. An example is U+0040 ( @ ) COMMERCIAL AT. Unicode proposes to change this to be a Symbol, instead of Punctuation. Perl code is somewhat shielded from this change, as qr/[[:punct:]]/ matches both Symbols and Punctuation in the ASCII range. However, what qr/\p{Punct}/ and qr/\p{Symbol}/ match would change, as would qr//[:punct:]]/ for non-ASCII characters.

There are other changes proposed as well. Use the first link above to get the details.

Leave a comment

About Karl Williamson

user-pic I try to fix Perl 5 to work better with Unicode