Stripping diacritics from input

If you have input containing lots of Unicode diacritics, and you need to process them into equivalent ASCII characters, there are several options on CPAN. My module Unicode::Diacritic::Strip offers a slow and reliable method involving the use of Unicode::UCD, and a fast method involving a tr/// operator.

Today I was examining user logs for a web application, and I noticed that the fast method had completely failed on input "Jalālu'd Dīn Muḥammad Rūmī" because it had failed to catch the middle h character. Looking at the Unicode characters I found a whole block of Latin characters which I'd omitted. I've now added them to the application for version 0.11

1 Comment

Leave a comment

About Ben Bullock

user-pic Perl user since about 2006, I have also released some CPAN modules.