Perl 5.18: getline and $/ = \N
Perl 5.18 will ship with a change in behaviour when using getline() (aka the <$handle> operator) on handles marked as returning Unicode where $/ is a reference to an integer.
If you're not familar with the behaviour, for a file with no PerlIO layers:
$/ = \500;
my $x = <$fh>; # read 500 bytes
This won't change in 5.18, but it will if the stream has a layer that internally returns unicode, such as any of:
:encoding(iso-8859-1):encoding(utf-16):encoding(utf-8)(ok, any :encoding stream):utf8
In 5.16 and earlier, getline() will read the specified number of bytes from the stream, even if that would fall in the middle of a character.
This leads to a few problems:
- the result can be a UTF-8 marked scalar that doesn't contain valid UTF-8.
- the input stream can be left on a non-character boundary.
- the record read only corresponds to bytes in the file if the file is UTF-8.
From 5.18, getline($fh) will behave like read($fh, $out, $$\) -- the specified number of characters will be read from the stream instead of the specified number of bytes.
For more details (and arguments), see perlbug ticket 79960.
I blog about Perl.
Seems good to me.
Using
$/ = \$intand character strings seems like a weird thing to do to begin with, but for those people who do need to do this, I can't imagine the current behaviour is useful for anything.