Tut # 7: jQuery, Ajax, xml, Perl, databases and utf8

By Ron Savage on May 8, 2013 5:04 AM

I've tried to cover significant interactions between the topics mentioned in the Title above:

The tutorial includes references to various now more-or-less standard documents, or collections thereof, pertaining to Perl, databases and utf8.

6 comments

6 Comments

Christian Hansen | May 8, 2013 8:45 PM | Reply

1) There is no need for using the utf8 pragma unless your literal strings contains UTF-X.

2) Using Encode.pm's utf8 decoder is a bad idea for external/untrusted data since it's a superset of UTF-8 and therefore accepts ill-formed UTF-8 (as specified by Unicode and ISO/IEC), use the UTF-8 decoder instead.

3) US-ASCII (aka as ASCII) is a subset of UTF-8 and has the same representation, it was a requirement when UTF-8 was designed.

4) U+00E8 LATIN SMALL LETTER E WITH GRAVE (è) has never been part of ASCII codespace, it's within the Latin1 codespace.

5) Why are you writing a tutorial for a topic you clearly doesn't understand?

Joel Berger | May 8, 2013 9:19 PM | Reply

If I may plug: Mojolicious works very hard to make utf8 handling very simple. This includes adding `use utf8` to all files and templates (when using Mojo::Base) and encoding/decoding at the boundaries of client/server interaction. Obviously the `open` stuff is still needed if you want to print to the terminal or read from a handle, but those aren't as common in a web framework.

Ron Savage | May 9, 2013 12:54 AM | Reply

A reply to J Berger

Thanx for the info re Mojolicious.

I have actually looked into using it a number of times. Indeed, I've even released a little module, MojoX::ValidateHeadLinks, which hopefully is of use to more people than just myself.

I was not aware of Mojo going to such lengths to support utf8.

Basically, though, the current exercise is retraining myself away from YUI and to jQuery, using an existing app as a known base.

I must admit being a bit shocked by the quantity of work I needed to do to switch, and am indirectly grateful I did not simultaneously take on switching the underlying app's basic code!

As for the end result, I'm delighted at how much better it is, both in terms of user-friendliness, and in the code itself, especially the JS side.

Cheers
Ron

Ron Savage | May 10, 2013 12:43 AM | Reply

4 attempts to post a reply to C Hansen have timed out. I give up.

Ron Savage | May 10, 2013 12:44 AM | Reply

A reply to C Hansen

(This is the 5th time I've posted this. Gotta hate
Movable Type for being so time-out happy these days. Seems to be a bug with Preview, so I won't do that).

1) I don't have any utf8 strings in the source code,
so are you saying that if I cut all references to
'use utf8;' that the program would still work
perfectly? I don't believe that, as I've indicated
by advising every module must contain the preamble.
This advise is based on problems with modules where
I had not included that preamble.

2) By "UTF-8 decoder" I assume you mean utf8::decode().
Tom's prescription # 12 contains:
"use Encode qw(encode decode);"
Have you advised him of his error?
Did you submit a POD patch for Encode advising that
it's deprecated vis-a-vis utf8?
Do you realise the docs for utf8 advise using Encode
(for non-utf8 work admittedly)?

3) True, and certainly something people working with
utf8 ought to be aware of, but not really relevant
here.

4) I did not refer to '?', but rather to '?'.
Either way, I repeatedly warned readers not to assume
'?' was part of ASCII. I did use the phrase
'extended ASCII'. This deliberate distortion of my
message is both gratuitious and nasty.

5) Ahhhh. I don't claim to know everything, but I do
claim to be human and fallible - unlike some people :-).
Nevertheless, I believe the artile has much in it
which is both correct and relevant to the fraught
topic which is utf8, and its interactions with other
topics such as web clients and databases.
Meaningful comments on what exactly needs to be fixed
in my text will of course be adopted.

Joel Berger | May 13, 2013 1:57 PM | Reply

I believe that it is correct to say that a program without utf8 strings in the source will behave correctly without `use utf8;`. That said, as chansen has mentioned himself, since UTF-8 is a superset of ASCII, why not? That way when you do add a UTF-8 character somewhere, you don't have to remember to go import the `utf8` pragma. Seems pragmatic to me :-)

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Ron Savage

I try to write all code in Perl, but find I end up writing in bash, CSS, HTML, JS, and SQL, and doing database design, just to get anything done...

More info »

Ron Savage