Stupid CPAN Tricks
Stupid CPAN Tricks:
I’ve uploaded my slides for 9 Perl modules you should have in your tool belt to the MadMongers file archive.
[From my blog.]
Oops, I meant to leave this comment here, rather than on the post where the talk was announced:
Those are all cool and nifty modules, but one of them screamed to me as an
outlier: Text::Unidecode.
tchrist can say it better than I ever could:
http://stackoverflow.com/a/6163129/40468
"Code that assumes you can remove diacritics to get at base ASCII letters is
evil, still, broken, brain-damaged, wrong, and justification for capital
punishment."
"Code that tries to reduce Unicode to ASCII is not merely wrong, its
perpetrator should never be allowed to work in programming again. Period. I’m
not even positive they should even be allowed to see again, since it obviously
hasn’t done them much good so far."
So, this module isn't a "useful cpan trick", but actively spreads
disinformation and bad practices about handling unicode text.
Do you even read Text::Unidecode's POD? It doesn't claim to cover all cases or even "reduce Unicode to ASCII". Its goal is to try to display Unicode characters to a non-Unicode display as best as it can. Do you prefer your users to see '????????' instead, or perhaps choose to not display anything at all "because it's impossible to do things 100% correctly anyway"?
Sure the module gets things wrong some of the time, but it's still damn useful and cool. I wouldn't execute Sean, but would buy him a beer instead. :-)
That's the problem with the slides from a talk: you miss all the important discussion around the minimal parts that you see.
"Useful", like "best", doesn't have an absolute definition that everyone shares. It's based on context.
The module itself is quite clear about what's it's doing, and calls itself a tool "of last resort". It's not spreading any disinformation. The author is fully aware of what he's doing and why. He says "It's better than nothing!", meaning "there's nothing that Text::Unidecode's algorithm is better than".
The bad practice is to not do your research. :)
Ether: There is nothing wrong about what you said. However, you are missing one critical point. Pragmatic beats Perfection every single time. In this case my web site is taking in addresses in UTF-8, but a shipping web service I'm using is not capable of dealing with UTF-8 or even the entire ASCII spectrum. It only handles standard alpha characters without accents. Therefore I needed to convert UTF-8 and accented characters into standard alphanumeric characters in order to not have this web service crashing on me.