Perl 5.18: getline and $/ = \N

Perl 5.18 will ship with a change in behaviour when using getline() (aka the <$handle> operator) on handles marked as returning Unicode where $/ is a reference to an integer.

If you're not familar with the behaviour, for a file with no PerlIO layers:

$/ = \500;
my $x = <$fh>; # read 500 bytes

This won't change in 5.18, but it will if the stream has a layer that internally returns unicode, such as any of:

  • :encoding(iso-8859-1)
  • :encoding(utf-16)
  • :encoding(utf-8) (ok, any :encoding stream)
  • :utf8

In 5.16 and earlier, getline() will read the specified number of bytes from the stream, even if that would fall in the middle of a character.

This leads to a few problems:

  • the result can be a UTF-8 marked scalar that doesn't contain valid UTF-8.
  • the input stream can be left on a non-character boundary.
  • the record read only corresponds to bytes in the file if the file is UTF-8.

From 5.18, getline($fh) will behave like read($fh, $out, $$\) -- the specified number of characters will be read from the stream instead of the specified number of bytes.

For more details (and arguments), see perlbug ticket 79960.

1 Comment

Seems good to me.

Using $/ = \$int and character strings seems like a weird thing to do to begin with, but for those people who do need to do this, I can't imagine the current behaviour is useful for anything.

Leave a comment

About Tony Cook

user-pic I blog about Perl.