Handling Perl character codes is very easy even for beginners.

I feel that Perl users are losing confidence because of negative feedback from other communities.

The opinions of people who intend to harm Perl are 99% useless in my experience.

Handling character codes is actually simple.

Because all you have to do is remember the following three things.

1. use utf8 and save file as UTF-8

2. if you print text, encode text to platform charset(Linux is UTF-8, Windows is cp932)

3. if you get text from outside, decode text from platform charset(Linux is UTF-8, Windows is cp932)

If "use v7;" enabled "use utf8", it would be less memorable and less mistake.

Beginners will first remember that Perl source code is stored in UTF-8.

Next, the introductory person will notice that using a print statement results in a "Malformed ..." warning.

And, to get rid of the warning, he will have to encode with the platform character code.

Once he have learned that, he will naturally learn that he need to decode when he bring in text from the platform.

Everything is simple and natural.

What we lack is only confidence.

All programming languages have problems with character encoding, not only Perl.

In my evaluation, it's pretty good that Perl has UTF-8 as its internal format.

UTF-8 is a byte unit, so you can output to the network without paying attention to the network byte order.

2 Comments

3. if you get text from outside, decode text from platform charset(Linux is UTF-8, Windows is cp932)

This is over-simplified. Firstly, different versions of Windows use different encodings (here in the UK, for example, we get cp1252). And secondly, you can't assume that a file you are given to process was created on the same platform you're processing it on.

The only safe advice here is to know the encoding of your input data and decode from that to Perl strings.

Leave a comment

About Yuki Kimoto

user-pic I'm Perl Programmer. I LOVE Perl. I want to contribute Perl community and Perl users.