Perl Feature Request, or am I missing something?

One of the most simple things you can do with Perl, is process text files. For that, you usually use the diamond operator, you chomp the read line, and process, line by line.

This is great and works mostly of the times.

Or, in the other hand, works when you can control where your files, the ones being processed, come from.

Because, if you want to make a generic application, you need to know that when running on Unix, chomp will remove the newline, but not the carriage return character, unlike when the same script is run under windows.

And I think that is annoying.

Why should we base on the architecture we are running on and not in the file we are processing?

If there any situation where, when reading line by line, we want chomp to maintain the carriage return if it is there?

I know chomp removes the character defined by $/, and this character is different from operating system to operating system.

But is this behavior the one we really mean?

Or am I missing something, and there is something I can do with file handles (IO::Handle, afaik) that will guess the line ending?

8 Comments

I use the PerlIO::eol module to normalize the line endings in uploaded CSV files, though I'm not sure that will work with the > operator, which is probably magic.

This kind of question is better asked on the Perl5Porters list. :)

p5p

Hrm, the filter cut out the email address: perl5-porters@perl.org

@Alberto: There's \R for quite some time. From perlre: "\R will atomically match a linebreak, including the network line-ending \x0D\x0A. Specifically, it is exactly equivalent to
(?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}])"

It would be nice if $/ accepted compiled regular expressions.

$/ = qr/\r?\n/;

How about using magic to guess the file type and then adding an appropriate IO layer?

    use File::LibMagic qw(:easy);
    my $layer = MagicFile($file) ~~ /\bCRLF\b/ ? ':crlf' : '';
    open my $fh, "<$layer", $file;

Then chomp just works normally.

Re \R:

"Quite some time" in this context means "since 5.9.5"

Leave a comment

About Alberto Simões

user-pic I blog about Perl. D'uh!