From the user perspective, Perl strings have no bugs and work well.

I feel that in the upcoming version of Perl, the core team fixes the Unicode bug as a reason to break backward compatibility Perl 5.

Unicode in Perl internally has some inconsistencies due to conflicts between latin-1 and UTF-8.

this is true.

On the other hand, from the user's point of view, a Perl string works perfectly fine if you only accept it can't tell whether it's a decoded string or a bytes.

We are solving this problem by convention.

Where do we determine if it is a string or a bytes?

The inside and outside of the program are completely separate.

If the data comes in from outside, then we will determine if it is a bytes or a string.

If it is a bytes, do nothing.

If it is a string, decode it.

This is simple and all of all works well.

In fact, this way is a good one.

Inside the program, we don't need to worry about the character code.

This is because you only need to be aware of encoding when you input/output data to/from the outside world.

It's really only Perl that has UTF-8 as its internal encoding and is so easy to handle.

The only thing you can do is that the decoded string and bytes are indistinguishable by the program logic.

We have the following conventions:

use strict;
use warnings;
use utf8;

It is used in our everyday programming, and we have some web frameworks enabled by default.

At least these three are necessary conventions for application engineers.

Leave a comment

About Yuki Kimoto

user-pic I'm Perl Programmer. I LOVE Perl. I want to contribute Perl community and Perl users.