Perl Dependency Checking

By bigfoot on January 5, 2019 8:42 AM

I'm working on a few projects right now, most notably one that helps me create a CPAN distribution so that I can create a Perl Lambda in the AWS environment. This has led me to some yak shaving exercises, most notably investigating how to check for Perl dependencies.

Without getting too far into the weeds on Perl Lambdas (that's another blog post in the writing), suffice it to say I need to vendor Perl modules and deploy them in the Lambda environment. I briefly looked at carton and that may solve the problem neatly, but my early dive indicated to me that another path might be a more direct shot on goal and produce a cleaner Lambda deployment methodology.

Back to the issue at hand...specifically this blog is going to discuss Perl dependency checking using these tools:

scandeps.pl
Devel::Modlist
/usr/lib/rpm/perl.req

I'm sure I'm over complicating things and experienced CPAN authors will most likely point to the current set of methods available for both dependency checking and building CPAN distributions. Nevertheless my confusion given the various methods available and the lack of a single authoritative voice compelled me to create something simple (at least from my toolchain's perspective) for creating CPAN distributions from uncomplicated projects.

Of course if it's a simple distribution, why not just create a Makefile.PL as part of your project and be done with? Sure, that works - but my needs and desire for more automation have gotten the best of me over the holidays - hence - make-cpan-dist

This is not an ad for that tool - I doubt anyone is interested, however, I wanted to blog a bit about dependency checking in the hopes that some of you that read these blogs might illuminate the broad, dark corners of my ignorance.

Some background. Where I work we have packaged Perl applications modules using the Redhat Package Manager (rpm) as we are very coupled to the RedHat/CentOS environment. We have persevered with the system Perl for lo these many years with no major issues.

Aside: Truth be told, it worked for us but we are now moving a lot of our new development to Python for many of the reasons that have been hashed out in many other blogs and internet forums. Personally, I enjoy writing Perl and although I direct development efforts, our needs have dictated rethinking some of our development methods including what language and frameworks to use. We'll still have Perl around for some time and I will continue writing Perl code until I find it no longer meets my needs.

The Redhat Package Manager includes a script (/usr/lib/rpm/perl.req) that manages to do a fairly decent job of teasing out the direct Perl dependencies of your module. One of its advantages is that it seems to semantically check for dependencies in a more complete manner than some other tools. For example take this simple Perl script:

$ echo "use parent qw/Foo::Bar/;" > foo.pl

The rpm tool reports Foo::Bar as a dependency, whereas scandeps.pl does not.

$ scandeps.pl foo.pl
No modules found!

Now let's try using the -Rc options (don't recurse, compile).

$ scandeps.pl -Rc foo.pl
Can't locate Foo/Bar.pm in @INC (you may need to install the Foo::Bar module) (@INC 
contains: /usr/lib/bedrock/perl5 /opt/perl-5.28.1/lib/site_perl/5.28.1/x86_64-linux /opt/perl- 
5.28.1/lib/site_perl/5.28.1 /opt/perl-5.28.1/lib/5.28.1/x86_64-linux /opt/perl- 
5.28.1/lib/5.28.1) at /opt/perl-5.28.1/lib/5.28.1/parent.pm line 16.
BEGIN failed--compilation aborted at foo.pl line 5.
SYSTEM ERROR in compiling foo.pl: 512 at /opt/perl- 
5.28.1/lib/site_perl/5.28.1/Module/ScanDeps.pm line 1448.

Let's use /usr/lib/rpm/perl.req

$ /usr/lib/rpm/perl.req  foo.pl
perl(Foo::Bar)
perl(parent)

Now Devel::Modlist...

$ perl -MDevel::Modlist=nocore foo.pl
Can't locate Foo/Bar.pm in @INC (you may need to install the Foo::Bar module) (@INC 
contains: /usr/lib/bedrock/perl5 /opt/perl-5.28.1/lib/site_perl/5.28.1/x86_64-linux /opt/perl- 
5.28.1/lib/site_perl/5.28.1 /opt/perl-5.28.1/lib/5.28.1/x86_64-linux /opt/perl 
-5.28.1/lib/5.28.1) at /opt/perl-5.28.1/lib/5.28.1/parent.pm line 16.
 BEGIN failed--compilation aborted at foo.pl line 4.

Okay, so the errors are clearly indicative of compiling and then dumping @INC I suppose. So it appears we have at least 2 different methods of finding Perl dependencies.

compile and dump @INC
parse Perl and do semantic checking

/usr/lib/rpm/perl.req appears to parse the Perl while scandeps.pl parses(?) the script, but also has an option for compiling in order to tease out additional dependencies...however it misses Foo::Bar when parsing only is specified. Both techniques have their advantages and disadvantages. Consider this:

$ echo "use LWP;" > foo.pl

Using scandeps.pl and allowing it to recurse...but no compilation.

$ scandeps.pl foo.pl | wc -l
96

Wow! Lot's of dependencies.

scandeps.pl, no recursing, no compiling

$ scandeps.pl -R foo.pl | wc -l
1

scandeps.pl, no recursing, with compiling

$ scandeps.pl -Rc foo.pl | wc -l
13

...and Devel::Modlist??

$ perl -MDevel::Modlist=nocore foo.pl | wc -l
13

Okay, that's a little better. What about the rpm tool?

$ /usr/lib/rpm/perl.req foo.pl | wc -l

So the direct dependency for foo.pl we know is only LWP and both /usr/lib/rpm/perl.req and scandeps.pl without compiling can figure that out, however scandeps.pl does get tripped up with our pathological case:

use parent qw/Foo::Bar/;

Why does any of this matter anyway? I'm not sure honestly, however a the end of the day I do want to make sure I articulate module dependencies in my CPAN distribution correctly.

While it's true using the options that recurse and compile will produce a more complete listing of dependencies it seems redundant (and possibly problematic if we start talking about module versions and what's actually in your current Perl environment). If every CPAN module lists its direct dependencies then tools like cpanm should have no problem resolving all dependencies for a given Perl module distributed on CPAN. Correct?

So it seems it would be "best" to list only your direct dependencies and hope everyone else has a complete list of their direct dependencies.

To do that (list direct dependencies only), from the tools I've looked at so far (which include others I have not discussed in this blog), /usr/lib/rpm/perl.req did the best job of telling me what the direct dependencies of my module are.

My gut tells me I am missing something and that the greater Perl community as always has the answers as to what's the best practice for describing dependencies in a CPAN distribution.

P.S. This is the most awfulest blog platform every invented. It's almost impossible to determine how to format things as code or fixed fonts despite the various allegedly supported formats - not to mention the maddening need to reset my password every time I log in (to the same damn password!). It would be nice if Perl had a central place where the community could blog that actually worked well.

6 comments

6 Comments

Vincenzo Buttazzo | January 7, 2019 2:59 AM | Reply

Nice article, but I don't understand what's wrong with Carton.
If I create a Perl function and I want to run it in your Lambda environment I will really need to specify which modules version I use. Running things in a undetermined context could easily broke up things if some module makes a major update.
It could be acceptable only for small tests, but also in this case having a dependency list is a better approach.
Maybe a compromise could be your service generating the "cpanfile" for me.

bigfoot replied to comment from Vincenzo Buttazzo | January 7, 2019 11:25 AM | Reply

There's probably nothing wrong with carton per se, thought I would only use it to help prepare the runtime environment. Running a Lambda one would not want to install the dependencies on invocation.

As I continue to work on the framework for running Lambdas I may create the option to use carton - or make that the primary method - but your point on versions is a good one and my technique is in fact to determine versions that are being used in the current environment and specify them in a dependency list that is then used by cpanm to install the files into a local environment for eventual bundling into a Lambda runtime environment. Essentially, I'm just doing what carton does under the covers (I believe) - but attempting to seed the dependency list rather that hand roll that.

I'm still curious as to how one determines in a relatively effective way what the dependencies are for a Perl modules
in general - although that subject seems well worn. I was trying to tease out of the community the relative effectiveness of using CPAN, the state of Perl's authoritative repository as it were vs more programmatic techniques that inspect the modules themselves. Thanks for responding...

Grinnz | January 7, 2019 7:58 PM | Reply

Some other commonly used options are Perl::PrereqScanner (used by the Dist::Zilla [AutoPrereqs] plugin) and Perl::PrereqScanner::Lite (used by scan-prereqs-cpanfile). Scanning prereqs is a **hard** problem to do perfectly, because modules are not always loaded at compile time, or even always as bareword module names; and even if you catch every possible module that might be loaded, you might have false positives that are never actually loaded for your use case. Personally, I always specify prereqs manually in a cpanfile as I'm developing, and have all of my other tools use that.

Grinnz | January 7, 2019 8:00 PM | Reply

Regarding the blog platform, I completely agree that it has many problems - I tend to still post here for greatest visibility, then post the link to reddit r/perl for comments. There is a grant for creating a modern replacement for blogs.perl.org that has had occasional progress.

Aristotle replied to comment from bigfoot | January 12, 2019 11:12 PM | Reply

The dependency metadata in a CPAN distribution is independent from the code that actually loads the needed modules at runtime, so the two can diverge, and occasionally they do. But it gets noticed quite quickly, most of the time – simply because if the metadata is incorrect then the module won’t work after installing.

At least usually. Errors that do not produce non-working installations can survive for a long time.

One way that this can happen is if an error is covered up by other parts of the dependency chain. Imagine that module A needs modules B and C, but the metadata only declares a dependency on module B. That error may go unnoticed if module B also needs module C and does declare that dependency. Dependencies get installed before their dependents, so by the time module A gets installed, both of its dependencies will have been installed – module C will have been installed to satisfy the dependency of module B, which will be installed prior to module A, to satisfy A’s dependency. And thus, miraculously, despite the missing dependency in module A’s metadata, installing it works. This kind of thing can be very hard to notice.

For the same reason it can be very hard to notice that the metadata lists a superfluous dependency, which the code doesn’t actually require. The most common way for this mistake to be introduced is over time – if a module used to require another module, but was later changed not to need it any more, it can be easy to forget to erase the dependency from the metadata.

Bottom line is that the metadata can be relied on to not have errors that would produce non-working installations at the time that it was created. It’s basically trustworthy, but not completely reliable.

Of course none of that helps one bit for programmatically figuring out what the metadata should look like for code (e.g. your own project) that doesn’t already have it…

bigfoot | January 18, 2019 7:53 PM | Reply

Thanks all for your comments...they were helpful. My takeaway regarding dependencies listed on CPAN is essentially, as Aristotle eloquently penned..."It’s basically trustworthy, but not completely reliable."

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About bigfoot

I blog about Perl and Bedrock.

More info »

rlauer