Perl Feature Request, or am I missing something?

By Alberto Simões on March 30, 2011 7:52 PM under Perl, Perl Syntax

One of the most simple things you can do with Perl, is process text files. For that, you usually use the diamond operator, you chomp the read line, and process, line by line.

This is great and works mostly of the times.

Or, in the other hand, works when you can control where your files, the ones being processed, come from.

Because, if you want to make a generic application, you need to know that when running on Unix, chomp will remove the newline, but not the carriage return character, unlike when the same script is run under windows.

And I think that is annoying.

Why should we base on the architecture we are running on and not in the file we are processing?

If there any situation where, when reading line by line, we want chomp to maintain the carriage return if it is there?

I know chomp removes the character defined by $/, and this character is different from operating system to operating system.

But is this behavior the one we really mean?

Or am I missing something, and there is something I can do with file handles (IO::Handle, afaik) that will guess the line ending?

8 comments

Tagged as:

carriage return, linux, macosx, newline, perl, unix, windows

8 Comments

preaction | March 30, 2011 8:43 PM | Reply

I use the PerlIO::eol module to normalize the line endings in uploaded CSV files, though I'm not sure that will work with the > operator, which is probably magic.

Alberto Simões replied to comment from preaction | March 30, 2011 8:47 PM | Reply

Hello,
Anything better than adding manually a
s/\012\r|\r\012|\012|\r/\012/g;
is great :)

Mithaldu | March 30, 2011 10:43 PM | Reply

This kind of question is better asked on the Perl5Porters list. :)

p5p

Mithaldu | March 30, 2011 10:44 PM | Reply

Hrm, the filter cut out the email address: perl5-porters@perl.org

Steven Haryanto | March 30, 2011 11:27 PM | Reply

@Alberto: There's \R for quite some time. From perlre: "\R will atomically match a linebreak, including the network line-ending \x0D\x0A. Specifically, it is exactly equivalent to
(?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}])"

Shawn H Corey | March 30, 2011 11:43 PM | Reply

It would be nice if $/ accepted compiled regular expressions.

$/ = qr/\r?\n/;

oylenshpeegul | March 31, 2011 2:27 AM | Reply

How about using magic to guess the file type and then adding an appropriate IO layer?

    use File::LibMagic qw(:easy);
    my $layer = MagicFile($file) ~~ /\bCRLF\b/ ? ':crlf' : '';
    open my $fh, "<$layer", $file;

Then chomp just works normally.

Tom Wyant | March 31, 2011 5:12 AM | Reply

Re \R:

"Quite some time" in this context means "since 5.9.5"

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Alberto Simões

I blog about Perl. D'uh!

More info »

Alberto Simões