Stripping diacritics from input
If you have input containing lots of Unicode diacritics, and you need to process them into equivalent ASCII characters, there are several options on CPAN. My module Unicode::Diacritic::Strip offers a slow and reliable method involving the use of Unicode::UCD, and a fast method involving a tr///
operator.
Today I was examining user logs for a web application, and I noticed that the fast method had completely failed on input "Jalālu'd Dīn Muḥammad Rūmī" because it had failed to catch the middle h character. Looking at the Unicode characters I found a whole block of Latin characters which I'd omitted. I've now added them to the application for version 0.11
Excellent :-)