Tut # 7: jQuery, Ajax, xml, Perl, databases and utf8

I've tried to cover significant interactions between the topics mentioned in the Title above:

Tutorial # 7

The tutorial includes references to various now more-or-less standard documents, or collections thereof, pertaining to Perl, databases and utf8.


1) There is no need for using the utf8 pragma unless your literal strings contains UTF-X.

2) Using Encode.pm's utf8 decoder is a bad idea for external/untrusted data since it's a superset of UTF-8 and therefore accepts ill-formed UTF-8 (as specified by Unicode and ISO/IEC), use the UTF-8 decoder instead.

3) US-ASCII (aka as ASCII) is a subset of UTF-8 and has the same representation, it was a requirement when UTF-8 was designed.

4) U+00E8 LATIN SMALL LETTER E WITH GRAVE (è) has never been part of ASCII codespace, it's within the Latin1 codespace.

5) Why are you writing a tutorial for a topic you clearly doesn't understand?

If I may plug: Mojolicious works very hard to make utf8 handling very simple. This includes adding `use utf8` to all files and templates (when using Mojo::Base) and encoding/decoding at the boundaries of client/server interaction. Obviously the `open` stuff is still needed if you want to print to the terminal or read from a handle, but those aren't as common in a web framework.

I believe that it is correct to say that a program without utf8 strings in the source will behave correctly without `use utf8;`. That said, as chansen has mentioned himself, since UTF-8 is a superset of ASCII, why not? That way when you do add a UTF-8 character somewhere, you don't have to remember to go import the `utf8` pragma. Seems pragmatic to me :-)

Leave a comment

About Ron Savage

user-pic I try to write all code in Perl, but find I end up writing in bash, CSS, HTML, JS, and SQL, and doing database design, just to get anything done...