What if we could drop archives into @INC?

What if I didn't have to install distributions, but instead just dropped the entire archive into a directory, much like a Java JAR file? I don't have a complete idea for this, but I have enough that I want to have public notes on it.

This is something that I think about when I can't do anything else. I'm on a bus or train in Chicago where any sign of Apple technology will get you jacked (the CTA even has signs telling people to be careful with their iPhones. Not Blackberrys or Samsung, or whatever, just iPhones). There's that time between finishing the in-flight magazine and reaching 10,000 feet, or waiting in line for passport control after I wonder if the guys with the guns would really shoot me if I took out my cell phone.

My idea is the confluence of several problems:

  • I'd like a way to have multiple versions installed without a huge @INC
  • Perl stops looking when it finds the first matching namespace
  • Many people want to install tests so they can test later too, mostly to check the tests against newer versions of dependencies
  • Perl 6 wants to load by name, version list, and author list.
  • People hate installing things.

The problems are legion and formidable:

  1. Perl maps namespaces to filenames, unfortunately
  2. You have to extract dependencies and fetch those distress
  3. Pure-perl Archive::Tar is really slow.
  4. You can't compartmentalize a namespace so you can have divergent versions of it
  5. You have to infer from the archive name what might be inside and what version it might be

Let's ignore the first two for a moment. Those are just work, even if that work is annoying. The last one is something I've wanted for a long time. I thought I wrote about this before, but I didn't find it (I can't remember any non-stopwords I would have used).

Instead of loading modules, I'd like to load groups of modules into a variable. This wouldn't really load the distribution. It merely loads a description of the distribution which I can then play with as some sort of loader object:


my $lwp4 = load( 'libwww-perl', '< 5.0', 'GAAS' ); # distro name stub
my $lwp5 = load( 'libwww-perl', '>= 5.0', 'GAAS' ); # co-exists nicely

my %namespaces = $lwp4->list_namespaces; # names and versions
my %dependencies = $lwp4->list_dependencies;
my $file = $lwp4->extract_file( $file );

my $test_result = $lwp4->run_tests;

# and many other accessors

my $ua = $lwp4->use( 'LWP::UserAgent', @import_list );

That requires a lot of extra bookkeeping to put things in different, but user-hidden namespaces.

But, let's say someone figures out that part. It might be really easy to do something like that if everything (and absolutely everything) was a Moose class so Moose can move all the class names around.

The next bits are the mechanics. You want to extract the files from the distribution as you need them. That's not terribly hard, although tar is probably not a good way to do it because you'll want random access and you want to know where all the files are without having to scan the entire archive. Sure, you can kludge some pre-indexing, but let's pretend that we could convert all of CPAN to a different archive format.

So, all of that is solved. Now you get to deal with all of the crappy code in CPAN. These modules use package variables because, probably rightly so, a module gets to do what it likes with its package. But, package variables don't work anymore because packages don't exist in the same way.

And then...well, that's as far as I've thought about this.

15 Comments

I've wanted something related to this quite a few times recently. I'd like to have multiple copies / versions of distributions around, and instantiate the different versions so I can run bake-offs, etc.

brian,

what about using .iso files or something like sqashfs (i didn't see anything in the CPAN regarding squashfs)?

I have no idea if .iso is better than tar or not. I'm just spitballing.

A quick google gave:

https://lwn.net/Articles/219827/

Interesting overview of compress file systems at least.

Awesome idea you have though.

gizmo

It's mostly a matter of an ecosystem. If you want Java-like installation, you'll need a Java-like runtime, with its 2,000 classes and whatnot, and almost everything else being reimplemented by the program creators. Same with something like Python, where they reinvent the wheel so often they could build a train where no wheel is the same.
If you want Perl's "90% of every program has already been written" concept of reuse, you'll have to throw away anything like the Java ecosystem.

I've deployed standalone perl apps and missed the feature greatly back then. It was rather annoying and tedious to hand-roll a self-unpacking perl directory with all modules and what not. I've tried PAR, it can make a one-contains-all exe file, but it still unzips all files in temp dir, not cool.

While a zip supoprt could be too hard to implement, an iso or an uncompressed tar could be easier to hack, possibly even more so with mmap(2).

I'm working on something like this right now.

The basic idea is to create PARs of whatever distributions you need (my system does build-time dependency checking, as it is designed to run on a system unconnected to the public internet) and drop those in $sitelib/pars (with the appropriate arch and perlversion hierarchy).

When building perl, I use -Dusesitecustomize, which causes perl to run $sitelib/sitecustomize.pl at startup.

This contains something like

use Config;
use PAR { repository => "file://${Config::sitelib}/pars@ };

I also build PAR, PAR::Dist, PAR::Repository, PAR::Repository::Client, etc. using -Dextras as part of my build.

I haven't finished testing (I was literally sitting down to start this pass of development when I saw your post), but the theory works well enough in that I've done it without sitecustomize and the "use PAR" line in my main script.

The issue I have right now is multiple directories of PARs.

If you have a module you're going to use regularly, throw it in $sitelib/pars and it will be picked up every time. Ideally I'd like to be able to have app-specific PARs loaded from some other dir by having the app also say:

use PAR { repository => "$FindBin::Bin/..lib/pars" };

or something similar. But PAR only supports one implicit PAR::Repository::Client object, so the second one would override the first and block access to the global PARs.

I think I may need to create a multi-par-repo proxy object that can stand in for PAR::Repository::Client but check in multiple repositories.

Actually, looking at the code of PAR.pm, it looks like subsequent calls to

use PAR { repository => ... };

push the constructed object into an array of repo clients that are used, so having a global and per-app repository should work fine.

Hi Folks

I know it's lateral-thinking type pain, but perhaps some people would be happy if they could install multiple Perls, /all the same version/, in different dirs, and then install the packages in parallel, so to speak. perlbrew doesn't seem to support a dest-dir option...

Unless of course you really do want $lwp4 and $lwp5 in the same program.

Cheers
Ron

You can name the Perls created by perlbrew, and that would change the directory they're in. I did this to get a threaded, non-threaded, and 32-bit/64-bit variations of the perl 5.12.1 that we use at work.

You can also, iirc, specify that a single directory of @INC be shared between them.

What Brian wants sounds like Python eggs, but what's the benefit over what local::lib could do?

I am curious - because I know that many of you know the trick I'm about to describe... but I don't see why it's not possible to hijack '@INC' just for this functionality. One could create a directory full of the dist archives you want to load, and generate an index mapping module names and versions to dist versions. The magic @INC sub could then unpack and require/import what's necessary based on the module versions specified. Perhaps better would be to add a new use/require function where one can specify the dist version they want to load the modules from...

It wouldn't be fast, and it *certainly* would be awfully difficult (and involve awful hackery!) but I think it *could* work.

The *really* troublesome part, of course, is that when loading different versions of a module, you're going to clobber the symbol tables for that module's namespaces... that *could* be handled via some more magic, like munging the package declarations in loaded modules to indicate the dist version they belong to, and then adding a function that switches the namespaces in and out within the lexical scope or file its loaded in...

Here be dragons, and deep black magick. Certainly over my capabilities, and perhaps impossible, to do properly

I guess I answered my own question. :-)

If I understand this correctly, I believe I attempted something similar to this a few years ago, with a modicum of success:

http://www.slideshare.net/shixilun/module-versioning-with-apache-1x-and-modperl-1x

I took 30 minutes and wrote a module to hack @inc to load compressed tar archives.

https://github.com/exodist/archlib

I also uploaded it to cpan, 'archlib' though it will take a couple hours to replicate out to mirrors.

Example:

use archlib 'path/to/archive';
use Module::In::Archive;
use Another::in::Archive;

Hopefully this is a good start at a module to meat your needs. Obviously it only addresses one part of the situation, but it can be expanded.

Leave a comment

About brian d foy

user-pic I'm the author of Mastering Perl, and the co-author of Learning Perl (6th Edition), Intermediate Perl, Programming Perl (4th Edition) and Effective Perl Programming (2nd Edition).