Perl Archives

POD speculation

Pod::Coverage

[...] expects to find either a =head(n>1) or an =item block documenting a subroutine.

In pure TIMTOWTDI spirit, it's up to you to decide which style you want to adopt. There are PROs and CONs, though.

Travis-CI and Perl

If you're interested into using Travis-CI for your Perl projects, here's a few pointers that you should not miss:

Too bad that it took me so long to find them out.

Graphics::Potrace

A few months ago I released Graphics::Potrace, that provides Perl bindings to the potrace library. So, if you want to convert rasters into vectors from Perl... you know where to go.

origami envelopes

I've always been fond of origami, and in some periods I also had time to fold some as a hobby. Alas, this is not the case any more... most of the times.

I'm also proud to produce my greeting cards for birthdays and occasions, when I remember to actually make one (which happens seldom... but happens). Some time ago I stumbled upon a neat design for an origami envelope - although I don't remember where I saw it, I've found a couple of web sites that include it (e.g. here). So... two of "my" things coming together...

Then I'm fond of Perl, of course. So why not kicking it in and use it to add an image to the back of the envelope... automatically?

Parse::RecDescent and number of elements read on the fly

I recently had to develop a small parser for some coworkers and I turned to Parse::RecDescent for handling it. The grammar was not particularly difficult to address, but it had some fields that behave like arrays whose number of elements is declared dynamically, so it's not generally possible to use the repetition facilities provided by Parse::RecDescent, because they require the number of repetitions to be known beforehand.

Logging in Dancer

I don't remember whether I blogged about Dancer::Logger::Log4perl or not, but a recent post by Ovid in Dancer's mailing list made me think that it would fit his use case. Unfortunately it seems that some of my messages did not make it into the mailing list (I didn't find them in the archived thread, anyway), so I'm blogging it here for a wider audience to bother.

If I understood Ovid's needs correctly, he needs an additional logging level in Dancer that allows to pass messages whatever the log level. Apart from the semantic level considerations here, the use case seemed to fit perfectly with using Log::Log4perl because it has a wide range of logging levels and the possibility to separate different logs in different parts.

A possible proof-of-concept is the following example:

#!/usr/bin/env perl
use strict;
use warnings;

use Dancer;
use Log::Log4perl qw< :easy >;

setting log4perl => {
   tiny => 0,
   config => '
      log4perl.logger                      = DEBUG, OnFile, OnScreen
      log4perl.appender.OnFile             = Log::Log4perl::Appender::File
      log4perl.appender.OnFile.filename    = sample-debug.log
      log4perl.appender.OnFile.mode        = append
      log4perl.appender.OnFile.layout      = Log::Log4perl::Layout::PatternLayout
      log4perl.appender.OnFile.layout.ConversionPattern = [%d] [%5p] %m%n
      log4perl.appender.OnScreen           = Log::Log4perl::Appender::ScreenColoredLevels
      log4perl.appender.OnScreen.color.ERROR = bold red
      log4perl.appender.OnScreen.color.FATAL = bold red
      log4perl.appender.OnScreen.color.OFF   = bold green
      log4perl.appender.OnScreen.Threshold = ERROR
      log4perl.appender.OnScreen.layout    = Log::Log4perl::Layout::PatternLayout
      log4perl.appender.OnScreen.layout.ConversionPattern = [%d] >>> %m%n
   ',
};
setting logger => 'log4perl';

get '/' => sub {
   warning 'just a plain warning here';
   error 'just a plain error here';
   content_type 'text/plain';
   return "normal here\n";
};

get '/special' => sub {
   debug 'inside special...';
   ALWAYS 'this is a special BUSINESS-LOGIC message!';
   warning 'inside special...';
   content_type 'text/plain';
   return "special here\n";
};

dance();

Having the full power of Log::Log4perl's configuration capabilities allows sending all the log messages to a file, while keeping the most important ones on the screen. Additionally, the usage of the ALWAYS stealth logger - which sends messages at the OFF logging level - also allows being sure that they are - guess what? - ALWAYS emitted whatever the log level.

As a nice add-on, these messages (that are business-logic related) can be visually distinguished on the terminal from e.g. error messages: in the example, ERRORs are shown in red, while ALWAYS messages are shown in light green. This is an example of what is shown on the screen:

$ perl sample.pl -p 3333

Dancer 1.3093 server 32177 listening on http://0.0.0.0:3333
== Entering the development dance floor ...
[2012/03/28 19:08:03] >>> just a plain error here
[2012/03/28 19:08:04] >>> this is a special BUSINESS-LOGIC message!

and, of course, the file contains the full log:

[2012/03/28 19:23:18] [ INFO] loading Dancer::Handler::Standalone handler
[2012/03/28 19:23:18] [ INFO] loading handler 'Dancer::Handler::Standalone'
[2012/03/28 19:23:25] [ INFO] request: GET / from 127.0.0.1
[2012/03/28 19:23:25] [ INFO] Trying to match 'GET /' against /^\/$/ (generated from '/')
[2012/03/28 19:23:25] [ INFO]   --> got 1
[2012/03/28 19:23:25] [ WARN] just a plain warning here
[2012/03/28 19:23:25] [ERROR] just a plain error here
[2012/03/28 19:23:25] [ INFO] response: 200
[2012/03/28 19:23:28] [ INFO] request: GET /special from 127.0.0.1
[2012/03/28 19:23:28] [ INFO] Trying to match 'GET /special' against /^\/$/ (generated from '/')
[2012/03/28 19:23:28] [ INFO] Trying to match 'GET /special' against /^\/special$/ (generated from '/special')
[2012/03/28 19:23:28] [ INFO]   --> got 1
[2012/03/28 19:23:28] [DEBUG] inside special...
[2012/03/28 19:23:28] [  OFF] this is a special BUSINESS-LOGIC message!
[2012/03/28 19:23:28] [ WARN] inside special...
[2012/03/28 19:23:28] [ INFO] response: 200

So, if you want all the power of Log::Log4perl while playing with Dancer... be sure to check Dancer::Logger::Log4perl out!

Sets operations

To help some coworkers I whipped up a program to perform set operations in Perl. It's quite basic but it's been pretty effective so far and it's on github.

Sets are assumed to be files where each line is a different element. It is assumed that equal lines are either not present or can be filtered out with no consequence. The inner working assumes that at a certain point the input files are sorted, and in general the external sort program is used automatically, which limits the applicability in some platforms.

The three basic operations that are supported are union, intersection and difference.

# intersect two files, also with "intersect", "i",
# "I" (uppercase "i") and "^"
sets file1 & file2

# union of two files, also with "union", "u", "U",
# "v", "V" and "|"
sets file1 + file2

# subtraction of second file from first one, also
# with "minus", "less" and "\"
sets file1 - file2

Other operations, e.g. symmetric difference, can be obtained with a combination of the predefined ones. Operations can be grouped and in general the expression will be provided as a single string to avoid the shell to creep in:

# symmetric difference, alternative 1
sets '(file1 - file2) + (file2 - file1)'

# symmetric difference, how you probably saw it somewhere
sets '(file1 + file2) - (file1 & file2)'

Operations associate from left to right, so the first group above is not needed. Anyway I usually prefer to be explicit.

As anticipated, at some point the program needs to work with a sorted input. The basic motivation for the program is handling operations on files with a few million elements, so putting all the stuff in memory is not an option; on the other hand, sort is quite efficient and reinventing the wheel is not an option as well!

Sorting is usually handled automatically with a call to the external sort utility (with the -u option, because sets are assumed to not contain duplicates); anyway, this can be a time consuming activity that is not necessary if you already know that your inputs are sorted, so you can tell the program when this is actually the case:

sets -s sorted-file1 ^ sorted-file2

When sorting is performed, it is usually done on the fly without saving the intermediate sorted files. They can be useful for following sets operations, or when the same input is used multiple times (e.g. in the case of the symmetric difference examples above), so it is possible to save the sorted files with the same name and a suffix appended:

sets -S .sorted '(file1 - file2) + (file2 - file1)'

If the sorted version of a file is found (i.e. file1.sorted and/or file2.sorted in the example above) it is used with no further sorting, speeding things up automatically.

Sometimes inputs might come from different platforms, so the line terminator would be different. In our case we don't need leading or trailing whitespaces, so there is a trimming options to avoid problems:

sets -t file1-unix - file2-dos

If you think that it can be useful for you, it's possible to download a bundled version that does not need external modules installed anywhere: enjoy sets!

Dist::Zilla, Pod::Weaver and bin

I use Dist::Zilla for managing my distributions. It's awesome and useful, although it lacks some bits of documentation every now and then; this lack is compensated in other ways, e.g. IRC.

I also use the PodWeaver plugin to automatically generate boilerplate POD stuff in the modules. Some time ago I needed to add some programs to a distribution of mine (which I also managed to forget at the moment, but this is another story), and this is where I got hit by all the voodoo.

The first program was actually a Perl program, consisting of a minimal script to call the appropriate run() method in one of the modules of the distro:

$ cat bin/perl-program 
#!/usr/bin/env perl
use prova;
prova->run();

This led Dist::Zilla to complain like this:

$ dzil build
[DZ] beginning to build prova
[DZ] guessing dist's main_module is lib/prova.pm
couldn't determine document name for bin/perl-program at ...

which isn't the best advice in the world (it actually complains about Pod::Weaver), but let's ignore it for the moment. After a bit of googling - or whatever, I actually don't remember - I found that there were basically two options:

  • put an explicit package declaration in the driver program, like this:

    $ cat bin/perl-program 
    #!/usr/bin/env perl
    package prova;
    use prova;
    prova->run();
    
  • put a comment with a PODNAME:

    $ cat bin/perl-program 
    #!/usr/bin/env perl
    # PODNAME: prova
    use prova;
    prova->run();
    

The latter seems slightly less dumb so I opted for it. I say slightly because IMHO the bottom line should be that the name is equal to the filename, but this is (again) another story. Yes, I know, it's open source and I can propose patches - did I say I'm not complaining?

Anyway, I then needed to add a shell script to the lot:

$ cat bin/script.sh 
#!/bin/bash
echo 'Hello, World!'

and again the error popped up:

$ dzil build
[DZ] beginning to build prova
[DZ] guessing dist's main_module is lib/prova.pm
[PodWeaver] [@Default/Name] couldn't find abstract in bin/perl-program
couldn't determine document name for bin/script.sh at ...

Now, I could use the PODNAME trick above:

$ cat bin/script.sh
#!/bin/bash
# PODNAME: script.sh
echo 'Hello, World!'

but it turns out - with little surprise - that the file is considered a Perl one, with the consequence that in the distribution it gets POD added to it:

#!/bin/bash
# PODNAME: script.sh
echo 'Hello, World!'

__END__
=pod

=head1 NAME

script.sh

=head1 VERSION

version 0.1.0

=head1 AUTHOR

Flavio Poletti <polettix@cpan.org>

=head1 COPYRIGHT AND LICENSE

blah blah blah...

=cut

The only thing is that - ehr... - bash does not like POD very much.

Google was not my friend in this case, but I found one in Dist::Zilla's IRC channel (which is #distzilla on irc.perl.org, by the way), which I think (hope) is Christopher J. Madsen, the original contributor of Dist::Zilla::Plugin::FileFinder::ByName.

He instructed me to use that module - which entered the core as of version 4.300003 - to obtain what I was after: keep the PodWeaver plugin working on Perl stuff, while ignoring shell stuff. But, more importantly, he adviced me about why.

Many plugins - including the PodWeaver one - rely upon the Finder role to do their work. This role is something that finds files to be used by the plugin; in the case of PodWeaver, the default is to take whatever module will be installed and whatever stuff is in the bin directory. OK, it can be a bit more complicated than this, but most of the times it's OK. This default is the same as if we configured the following in the dist.ini file:

[PodWeaver]
finder = :InstallModules
finder = :ExecFiles

It turns out that if we explicitly configure a finder all the defaults are wiped away, so - for example - if our bin directory contained shell scripts only we could be happy with this:

[PodWeaver]
finder = :InstallModules

In our case, anyway, this would disable PodWeaver for Perl programs as well, which is not acceptable. This is where FileFinder::ByName kicks in:

[FileFinder::ByName / BinNotShell]
dir = bin
skip = .*\.sh$

This plugin lets us create new finders. In the example above, we are creating a finder that finds stuff in the bin directory, skipping all files that end with .sh (note that skip sets a regular expression). At this point, we're ready for the configuration of the PodWeaver plugin:

[PodWeaver]
finder = :InstallModules
finder = BinNotShell

It's correct, the new finder does not want the initial colon. Now, when it's time for the Pod::Weaver plugin to find files, it will get all the modules that will be installed AND the files in the bin directory whose name does not end with .sh. Yay!

Exclusive Perl Archive Nook

I started working on epan, a (somewhat thin) wrapper around cpanminus to create a version of CPAN trimmed down to your needs for installing specific stuff.

This is what the cool guys probably call DPAN these days, but I found that the whole concept of DarkPAN revolves much around getting your private stuff into the "normal" Perl toolchain, while in this case I need to be able to easily install modules in machines that are out of Internet reach.

To start with an example, suppose you have to install Dancer and a couple of its plugins in a machine that - for good reasons - is not connected to the Internet. It's easy to get the distribution files for Dancer and the plugins... but what about the dependencies? It can easily become a nightmare, forcing you to go back and forth with new modules as soon as you discover the need to install them.

Thanks to cpanminus, this is quite easier these days: it can actually do what's needed with a single command:

      # on the machine connected to the Internet or to a minicpan
      $ cpanm -L xxx --scandeps --save-dists dists \
           Dancer Dancer::Plugin::FlashNote ...

which places all the modules in subdirectory dists (thanks to option --save-dists) with an arrangement similar to what you would expect from a CPAN mirror. Alas, on the target machine, you still have to make some work - e.g. you should collect the output from the invocation of cpanm above to figure out the order to use for installing the distribution files.

Additionally, the directory structure that is generated lacks a proper index file (located in modules/02package.details.txt.gz) so it would be difficult to use the normal toolchain.

epan aims at filling up the last mile to get the job done, providing you with a subdirectory that is ready for deployment, with all the bits in place to push automation as much as possible. So you can do this:

      # on the machine connected to the Internet or to a minicpan
      $ epan create Dancer Dancer::Plugin::FlashNote ...
      $ tar cvzf epan.tar.gz epan

transfer epan.tar.gz to the target machine and...

      # on the target machine
      $ tar xvzf epan.tar.gz
      $ cd epan
      $ ./install.sh

optionally providing an installation target directory:

      $ ./install.sh /path/to/local/perl

The epan directory that is generated should be compatible with other tools - in particular, the modules/02package.details.txt.gz file is generated, so all the toolchain (including cpan) should play nicely with it.

DotCloud::Environment

I'm in the process of releasing DotCloud::Environment, a module that should ease the developer's life with providing a unified entry point to get dotCloud's configurations for an application.

A typical case I had while playing with dotCloud was that I could easily deploy an application, but I had no simple way to setup a basic test environment in my development machine. This is unfortunate because it shifts all testing on the deployed infrastructure.

When you create an application on dotCloud, you're probably going to have some services that resolve to code you have to write, other ones that resolve to data you're going to populate or use. The link that allows a code service to access a data service is the file /home/dotcloud/environment.json (or its equivalent YAML representation /home/dotcloud/environment.yml), so you know where to look for when you are in the deployed environment.

What happens when you are in your development environment? You're basically on your own - you have to figure out that you're not in dotCloud, find some place where to put the configuration, etc. etc. In a few words: boring stuff that you don't need.

DotCloud::Environment's goal is to streamline and simplify the process, providing a unified interface to access dotCloud's configuration in whatever environment you are. It has a default mode that should fit the typical case, while letting you decide that your setup needs some customisations.

A Typical Directory Layout

(for some definition of typical, of course)

In order to keep your code clean, you will probably be dividing it depending on the functional block that will be deployed as a service in dotCloud. Suppose that you have a frontend service, a backend service and a database; you probably have the following directory layout:

project
+- dotcloud.yml
+- backend
|  | ...
|  +- lib
|     +- Backend.pm
+- frontend
|  | ...
|  +- lib
|     +- FrontEnd.pm
+- lib
   +- Shared.pm

Each service is put into a separate directory and all the code that they both use (e.g. functions to connect to databases) is put in a common lib directory.

Where Should I Put My Local Configuration?

The main goal is to let it find the right environment.json (or, equivalently, environment.yml) depending on the environment you are into. If you are in dotCloud there is actually no problem, because by default the right /home/dotcloud/environment.json file is selected; for your local development the best thing to do is to put the configuration file in the project's root directory, which becomes like this:

project
+- dotcloud.yml
+- backend
|  | ... 
|  +- lib
|     +- Backend.pm
+- frontend
|  | ... 
|  +- lib
|     +- FrontEnd.pm
+- lib
|  +- Shared.pm
|     
+- environment.json

Putting the file in that position lets DotCloud::Environment find it by default when no /home/dotcloud/environment.json file (or the equivalent YAML file) is found in the system. Which hopefully is the case of your development environment.

Of course you should customise this environment.json/environment.yml file to suit your needs in the development environment, following the same rules that dotCloud uses to generate it. It's quite straightforward so you should not have problems with this.

And Now?

Now you're ready to use DotCloud::Environment!

In your code services you will probably need to access the shared library only. DotCloud::Environment helps you find the directory to provide to use lib via the path_for function:

# -- in BackEnd.pm and FrontEnd.pm --
use DotCloud::Environment 'path_for';
use lib path_for('lib');
use Shared ...;

On the other hand, in your shared library you will probably need to access the actual configurations for the environment you are in. The most straightforward way to do this is via the dotvars function, that provides you an (anonymous on request) hash containing the relevant configurations for the service you need:

# -- in Shared.pm --
use DotCloud::Environment 'dotvars';

# ... when you need it... 
my $vars = dotvars('service-name');

For example, suppose that you want to implement a function to connect to a Redis service called redisdb and a function to connect to a MySQL service called sqldb:

use DotCloud::Environment 'dotvars';
sub get_redis {
   my $vars = dotvars('redisdb');  # getting an anonymous hash

   require Redis;
   my $redis = Redis->new(server => "$vars->{host}:$vars->{port}");
   $redis->auth($vars->{password});
   return $redis;
}
sub get_sqldb {
   my %vars = dotvars('redisdb'); # getting a hash
   my ($host, $port, $user, $pass)
         = @vars{qw< host port login password >}

   require DBI;
   my $dbh = DBI->connect("dbi:mysql:host=$host;port=$port;database=db",
                          $user, $pass, {RaiseError => 1});
}

Of course you can use dotvars directly in FrontEnd.pm and BackEnd.pm, but you will probably benefit from refactoring your common code to avoid duplications.

Minor Customisations

If you don't like how dotvars (or any other function in the functional interface) is named, you can take advantage of the fact that Sub::Exporter is used behind the scenes. This lets you do this:

use DotCloud::Environment
   dotvars => { -as => 'dotcloud_variables_for' };
my %vars = dotcloud_variables_for('my-service');

Major Customisations

DotCloud::Environment lets you play with an interface that is wider than just using path_for and dotvars, of course, in addition on not strictly requiring you to put the development configuration file in the suggested position. You're encouraged to take a look at the full documentation; in case you want to position your file in some fixed position, you can use the environment variable DOTCLOUD_ENVIRONMENT.

Conclusions

Do you like dotCloud? I do, and now I think that I'll find it a bit easier to develop stuff that is (dot)Cloud-ready (to use some buzz-expression).

About Flavio Poletti

user-pic I blog about Perl.