Perl I/O on scalars for 5.18

By Tony Cook on February 8, 2013 8:55 AM

From perl 5.17.9, the following:

my $scalar;
...
open my $fh, "<", \$scalar or die;

will fail unless $scalar contains only code points 0xFF or lower - ie. they can be represented as bytes.

Perl's I/O will also treat the code points contained in the scalar as bytes - character \xA1 will be read as that byte, whether perl's internal representation is as UTF-8 or as bytes (or for the internals minded, whether SVf_UTF8 is on or not.)

Unfortunately in some cases this leads to a silent change in behaviour.

The change enforces two rules:

files contain only bytes, since opening a file handle on a scalar is intended to emulate I/O on a file, the scalar you use on a scalar open may only contain code points that fit in a byte.

If you supply a scalar with code points over 0xFF, open will fail, and if use warnings 'utf8' is in effect, perl will warn.
scalars that compare equal contain the same data when read as a file.

Previously, the bytes you read from the file depended on the internal representation, now you always get the code points in the scalar (as modified by any layers.)

Unfortunately this leads to a silent change in behaviour when a scalar containing only code points 0xFF or below and internally represented in UTF-8 is used in a scalar open.

Previously, the bytes you read from the scalar would be those UTF-8 bytes, not the code points stored in the scalar.

If you suspect your code will have this problem, or you're seeing open failures on a scalar in bleadperl, you may need to consider:

if you're supplying a function or method parameter to open, is that parameter meant to be text, or text encoded to bytes? You may want to document which it is. If it's the first, use $scalar = encode('utf-8', $scalar); to make it usable.
if you're supplying literal text (perhaps in the context of use utf8;, you almost certainly need to encode the text, to give it a consistent representation under both the old and new behaviours.