How can I make this Perl code run faster ?
Hello world and Happy Holidays!! This is my first time blogging in blogs.perl.org and I figure I take this opportunity to ask the Perl community for suggestions on how I can make this Perl code run faster.
https://github.com/itcharlie/IDPcharlie/blob/master/perltools/bundle2pinto.pl
As the name of the script implies, I want to parse a cpan autobundle file so I can generate a list of distribution files from which I can create a Pinto repository. Please note that the script is incomplete and I am just wondering if there is a better approach to generate a list of distribution files.
A sample of an autobundle file can be found in the link below:
https://github.com/itcharlie/IDPcharlie/blob/master/perltools/Snapshot2013121700.pm
Please note that the way to run this program is by passing the autobundle filename as an argument like so:
./bundle2pinto.pl Snapshot2013121700.pm
Below is a copy of the code:
#!/usr/bin/env perl
# This script will parse a cpan bundle file and create a pinto repository
# with modules listed in the bundle file.
use strict;
use Data::Dumper;
use LWP::Simple;
use JSON;
my $file = $ARGV[0];
open( my $fh , "<", $file )
or die "Unable to open $file \n $!";
# Parse bundle file and determine distribution file url for each module version
my %modules =();
my %undef_versions = ();
my $head_cont = 0;
while ( my $line = <$fh>) {
if ( $line =~ /^\=head1\sCONTENTS/ ) {
$head_cont = 1;
next;
}
next if ( $head_cont == 0 || $line =~ /^$/);
last if ( $head_cont && $line =~ /^\=head1/ );
$line =~ s/ +/ /g;
my @fields = split( ' ', $line);
# skip functions
next if $fields[0] =~ /^[a-z]/;
# skip undef module versions
if ( $fields[1] == "undef") {
$undef_versions{$fields[0]} = 1;
next;
}
$modules{$fields[0]}{'VERSION'} = $fields[1];
}
my %dist_archives =();
for my $mod ( keys %modules ) {
# Store the archive url in the hash for the modules that do have versions defined
my $archive_url = dist_archive_url( $mod, $modules{$mod}{'VERSION'} ) ;
next if ( ! $archive_url );
$dist_archives{$archive_url} = 1;
}
print Dumper \%dist_archives;
#print Dumper \%undef_versions;
# Attempt to search for Module archive via cpan api.
sub dist_archive_url {
my ($mod , $version) = @_;
my $json = JSON->new();
my $search_cpan = "http://search.cpan.org/api/";
my $mod_url = $search_cpan . "module/" . $mod;
my $mod_data_json = get( $mod_url);
my $mod_data = $json->decode( $mod_data_json ) ;
my $dist = $mod_data->{'distvname'};
$dist =~ s/\-\d+\.\d+$//; # remove the version number
my $dist_url = "http://search.cpan.org/api/dist/" . $dist ;
my $dist_data_json = get( $dist_url);
my $dist_data = $json->decode( $dist_data_json ) ;
my $archive_url;
for my $release ( @{$dist_data->{releases}} ) {
if ( $release->{version} eq $version ) {
$archive_url = $release->{cpanid} . "/" . $release->{'archive'};
}
}
return $archive_url;
}
# Create a Pinto repo and pass in the ur
Also I would like to know if this something that has been done before and If so how did you solve this problem?