Tie::File don't while(<>)

Consider the common case where you want to take a text file and walk it doing various operations such as stripping whitespace, looking for duplicates, sorting or other common operations. I was recently asked to work on a program where part of the process is 'cleaning' uploaded files before doing other things with it. This process was over 40 lines of subs opening old and new filehandles, doing the standard while(<>) {} and then renaming new file to old file. If you find yourself wanting to do something you would normally do on an array, but on a file and often in place, consider Tie::File before while()ing away at it. The documentation has many examples and there are several examples on the web including at Perl Monks. In this case I was able to whittle this down to this:

tie my @file, 'Tie::File', $file;
@file = grep !$seen{$_}++,
            sort,
            map s/$RE{ws}{crop}//g,
            map lc, @file;

untie @file;

If you're working with CSV or other delimited files, consider Tie::Array::CSV

Cheers

3 Comments

I think "sort @file" should be simply "sort", right?

Moreover, if you're useing strict that %seen hash must have been declared above. I usually prefer to use List::MoreUtils::uniq method instead because it seems simpler and more intuitive.

I've never used Tie::File before. I'm sure going to give it a try. Thanks!

One of the advantages of Tie::File is that you don't have to slurp all the file contents into memory. You can just push() or do a while ($item = each(@ary)), for example to work item by item.

On the other hand, it's slow compared to low-level routines like open/read/>.

The each(ARRAY) is Perl 5.12+ only.

Leave a comment

About Jesse Shy

user-pic Perl pays for my travel addiction.