And Now for Something Not Moose Releated

Well been eating way too much over the past posts so I though i would go though today's little upset for a change in pace.

Well today I was going back in time a little and had to write up a little script that would get the latest file on a dir and count the number of lines in it (don't ask why) and the do a little more processing.

Well it seemed simple enough and I have done similar before but for kicks I decided to do a Google search for 'Get the latest file with Perl'

Well my eye got sore after reading the endless variations on the same theme, from 30 line monstrosities to one line wonders that would befuddle even Peteris Krumins?

Well then I harkened back to something that Lary said in his YAPC::EU 2010 Keynote talk, that Perl was denounced in the Unix world because it does everything but doesn't do it well.

m_img_12879.jpg


Then I though to do this in Linux is easy I just do a sorted 'ls'. Then I wondered, 'there must be a Linux command for counting lines in a file?', and sure enough there is, and it is called 'WC'.

So why not have Perl, that doesn't do some things well, use something that does do it well!

Now in perl we have something called the back-tick or '`' it is the funny apostrophe thingy under the tilde or '~' the squiggly line thingy up in the left hand corner if you happen to have an English keyboard and that lets us run system commands and return their output back into Perl. So after about 10 mins of playing around with my two Linux commands here is my little bit of Perl

 

my @mostrecent = `ls -lt $dir | awk \'{print \$9}\'`;
shift(@mostrecent);
foreach my $file (@mostrecent){
chomp($file);
next
unless(-f $dir."/".$file);
my $line_count = `wc -l < $dir."/".$file`;
chomp($line_count);
...

So whats going on here. Well I first I use the '`' backtick to just 'ls' with '-lt' and gives me all the files in descending order by date and then I do cheat a little as I use 'awk' to help me get the 9th field in the list which is the file name and then Perl just sucks this into a nice Array for me. Now you might also notice I had to escape the ' and the $ in the 'awk' it is just a necessary evil

Next I do a 'Shift' to get rid of the first row that is always going to be empty. Now I am sorry to say going iterate over this array, because in my case I could have folders or symlinks or nasty Lunix stuff when I am really only interested in files so I just have to live with this.

In my loop I chomp the $file to get rid of the nasty little '\n' that Linux tags at the end of my $file. Now I use the the much ignored perl X function though in this case it is a '-f' that tests to see if my $file is a file. If it isn't then I just do the next $file in the array.

Then I just use my backticks again to a quick line count with 'ws -l' and load it into $line_count and then a final 'chomp' to clean off the nasty '/n' that Linux add on and there you have it.

Now of course this is not usable on a Windows box and perhaps others but it is just a little script to save me some time.

beard-programmers-final-two.png

7 Comments

ls will output one filename per line when given the -1 parameter (and not -l). Also, I don't think you can concatenate strings inside backticks like that, and wc will accept an input file as a parameter, so:

my @mostrecent = `ls -1t "$dir"`;
    foreach my $file (@mostrecent){
        chomp($file);
        next unless(-f "$dir/$file");
        my $line_count = `wc -l "$dir/$file"`;
        chomp($line_count); 
                ...

How about

cd $dir
cat $( ls -lrt | grep ^\- | tail -1 | awk '{ print $9 }' ) | wc -l

?

Here is how to do it the perl way:

#!/usr/bin/env perl

use 5.010; # // say
use strict;
use warnings;
use autodie;

my $dir = shift // '.';

# Map filename => modification time
my %files =
# Note that we skip symbolic links as in the original code
map { (!/^\./ && -f "$dir/$_") ? ($_ => -M _) : () }
do {
opendir(my $dh, $dir);
my @files = readdir $dh;
closedir $dh;
@files
};

# Sort files by descending modification time
foreach my $file (sort { $files{$a} $files{$b} } keys %files) {
my $lines = `wc -l chomp $lines;
say "$file: $lines";
#...
}

Note: I kept the call to 'wc' (so this code is not fully portable to Windows) because I expect (benchmark!) 'wc' to be faster, at least if files are big.

Movable Type ate a bit of code...

 my $lines = `wc -l < "$dir/$file"`;

As a oneliner:

$ perl -e 'print join("\t",(map{`wc -l $_`}
   grep{-f}map{chomp;$_}(`ls -1t`))[0]=~/(\S+)/g),"\n";'
6	title_12_1832.txt

As script-snippet:

my $dir = ".";
my ($lines,$file) = ( 
       map { `wc -l $_` } 
       grep { -f  "$dir/$_" } 
       map { chomp; $_ } 
       (`ls -1t "$dir"`)
    )[0] =~ /(\S+)/g;

@dolmen: Your code filters away filenames beginning with '.'.

I tried $count++ while() against 'wc -l'. On a file with 3 mio. lines it's 0.466s versus 0.087s on a Mac Air, but on a small file with 6 lines it's 0.006s versus 0.018s. IMHO the pure perl version has pros than cons.

Crappy MT ...

$count++ while(<$filehandle>) 

Leave a comment

About byterock

user-pic Long time Perl guy, a few CPAN mods allot of work on DBD::Oracle and a few YAPC presentations