Perl 5.18: getline and $/ = \N
Perl 5.18 will ship with a change in behaviour when using getline()
(aka the <$handle>
operator) on handles marked as returning Unicode where $/
is a reference to an integer.
If you're not familar with the behaviour, for a file with no PerlIO layers:
$/ = \500;
my $x = <$fh>; # read 500 bytes
This won't change in 5.18, but it will if the stream has a layer that internally returns unicode, such as any of:
:encoding(iso-8859-1)
:encoding(utf-16)
:encoding(utf-8)
(ok, any :encoding stream):utf8
In 5.16 and earlier, getline()
will read the specified number of bytes from the stream, even if that would fall in the middle of a character.
This leads to a few problems:
- the result can be a UTF-8 marked scalar that doesn't contain valid UTF-8.
- the input stream can be left on a non-character boundary.
- the record read only corresponds to bytes in the file if the file is UTF-8.
From 5.18, getline($fh)
will behave like read($fh, $out, $$\)
-- the specified number of characters will be read from the stream instead of the specified number of bytes.
For more details (and arguments), see perlbug ticket 79960.
Seems good to me.
Using
$/ = \$int
and character strings seems like a weird thing to do to begin with, but for those people who do need to do this, I can't imagine the current behaviour is useful for anything.