Unicode Regexes
Tom Christiansen will give a free workshop at YAPC::NA 2012 described as:
In a world where Unicode is increasingly essential for text processing, Perl offers the best and least painful support of any major language, smoothly integrating Unicode everywhere—including in Perl’s most popular feature: regular expressions.
Simple patterns like [a-z] or \d no longer cut the mustard, partly because Unicode is such a large character set, and partly because of multiple ways of writing characters with diacritics. There are many land mines in regular expressions now that Unicode has to be taken into account.
This session details how to use Perl regular expressions on Unicode text. Augumented versions of familiar idioms now do a lot more than they used, and brand new ones have been added. Beyond these shortcuts, thousands of Unicode properties are available to let you say exactly what you mean. Learn how to tailor your own properties and character sequences, how to portably handle word and line boundaries, how to match several different kinds of grapheme clusters, and how to define your own character properties.
[From the YAPC::NA Blog.]
Leave a comment