And Now for Something Not Moose Releated

By byterock on June 4, 2014 10:13 PM

Well been eating way too much over the past posts so I though i would go though today's little upset for a change in pace.

Well today I was going back in time a little and had to write up a little script that would get the latest file on a dir and count the number of lines in it (don't ask why) and the do a little more processing.

Well it seemed simple enough and I have done similar before but for kicks I decided to do a Google search for 'Get the latest file with Perl'

Well my eye got sore after reading the endless variations on the same theme, from 30 line monstrosities to one line wonders that would befuddle even Peteris Krumins?

Well then I harkened back to something that Lary said in his YAPC::EU 2010 Keynote talk, that Perl was denounced in the Unix world because it does everything but doesn't do it well.

Then I though to do this in Linux is easy I just do a sorted 'ls'. Then I wondered, 'there must be a Linux command for counting lines in a file?', and sure enough there is, and it is called 'WC'.

So why not have Perl, that doesn't do some things well, use something that does do it well!

Now in perl we have something called the back-tick or '`' it is the funny apostrophe thingy under the tilde or '~' the squiggly line thingy up in the left hand corner if you happen to have an English keyboard and that lets us run system commands and return their output back into Perl. So after about 10 mins of playing around with my two Linux commands here is my little bit of Perl

 

        my @mostrecent = `ls -lt $dir | awk \'{print \$9}\'`;

	shift(@mostrecent);

	foreach my $file (@mostrecent){

		 chomp($file);

	 	next 

	 		unless(-f $dir."/".$file);

 		my $line_count = `wc -l <  $dir."/".$file`;

		 chomp($line_count); 

                ...

So whats going on here. Well I first I use the '`' backtick to just 'ls' with '-lt' and gives me all the files in descending order by date and then I do cheat a little as I use 'awk' to help me get the 9th field in the list which is the file name and then Perl just sucks this into a nice Array for me. Now you might also notice I had to escape the ' and the $ in the 'awk' it is just a necessary evil

Next I do a 'Shift' to get rid of the first row that is always going to be empty. Now I am sorry to say going iterate over this array, because in my case I could have folders or symlinks or nasty Lunix stuff when I am really only interested in files so I just have to live with this.

In my loop I chomp the $file to get rid of the nasty little '\n' that Linux tags at the end of my $file. Now I use the the much ignored perl X function though in this case it is a '-f' that tests to see if my $file is a file. If it isn't then I just do the next $file in the array.

Then I just use my backticks again to a quick line count with 'ws -l' and load it into $line_count and then a final 'chomp' to clean off the nasty '/n' that Linux add on and there you have it.

Now of course this is not usable on a Windows box and perhaps others but it is just a little script to save me some time.

7 comments

7 Comments

merlynkline | June 5, 2014 3:26 AM | Reply

ls will output one filename per line when given the -1 parameter (and not -l). Also, I don't think you can concatenate strings inside backticks like that, and wc will accept an input file as a parameter, so:

my @mostrecent = `ls -1t "$dir"`;
    foreach my $file (@mostrecent){
        chomp($file);
        next unless(-f "$dir/$file");
        my $line_count = `wc -l "$dir/$file"`;
        chomp($line_count); 
                ...

confuseAcat | June 5, 2014 4:54 AM | Reply

How about
cd $dir cat $( ls -lrt | grep ^\- | tail -1 | awk '{ print $9 }' ) | wc -l
?

Olivier Mengué (dolmen) | June 5, 2014 10:24 AM | Reply

Here is how to do it the perl way:

#!/usr/bin/env perl

use 5.010;  #  //  say

use strict;

use warnings;

use autodie;

my $dir = shift // '.';

# Map filename => modification time

my %files =

    # Note that we skip symbolic links as in the original code

    map { (!/^\./ && -f "$dir/$_") ? ($_ => -M _) : () }

    do {

	opendir(my $dh, $dir);

	my @files = readdir $dh;

	closedir $dh;

	@files

    };

# Sort files by descending modification time

foreach my $file (sort { $files{$a}  $files{$b} } keys %files) {

    my $lines = `wc -l 
    chomp $lines;

    say "$file: $lines";

    #...

}

Note: I kept the call to 'wc' (so this code is not fully portable to Windows) because I expect (benchmark!) 'wc' to be faster, at least if files are big.

Olivier Mengué (dolmen) replied to comment from Olivier Mengué (dolmen) | June 5, 2014 10:27 AM | Reply

Movable Type ate a bit of code...

 my $lines = `wc -l < "$dir/$file"`;

Helmut Wollmersdorfer | June 5, 2014 3:51 PM | Reply

As a oneliner:

$ perl -e 'print join("\t",(map{`wc -l $_`}
   grep{-f}map{chomp;$_}(`ls -1t`))[0]=~/(\S+)/g),"\n";'
6	title_12_1832.txt

As script-snippet:

my $dir = ".";
my ($lines,$file) = ( 
       map { `wc -l $_` } 
       grep { -f  "$dir/$_" } 
       map { chomp; $_ } 
       (`ls -1t "$dir"`)
    )[0] =~ /(\S+)/g;

Helmut Wollmersdorfer replied to comment from Olivier Mengué (dolmen) | June 5, 2014 4:29 PM | Reply

@dolmen: Your code filters away filenames beginning with '.'.

I tried $count++ while() against 'wc -l'. On a file with 3 mio. lines it's 0.466s versus 0.087s on a Mac Air, but on a small file with 6 lines it's 0.006s versus 0.018s. IMHO the pure perl version has pros than cons.

Helmut Wollmersdorfer | June 5, 2014 4:32 PM | Reply

Crappy MT ...

$count++ while(<$filehandle>)

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About byterock

Long time Perl guy, a few CPAN mods allot of work on DBD::Oracle and a few YAPC presentations

More info »

byterock