What if we could drop archives into @INC?

By brian d foy on February 14, 2012 12:41 PM

What if I didn't have to install distributions, but instead just dropped the entire archive into a directory, much like a Java JAR file? I don't have a complete idea for this, but I have enough that I want to have public notes on it.

This is something that I think about when I can't do anything else. I'm on a bus or train in Chicago where any sign of Apple technology will get you jacked (the CTA even has signs telling people to be careful with their iPhones. Not Blackberrys or Samsung, or whatever, just iPhones). There's that time between finishing the in-flight magazine and reaching 10,000 feet, or waiting in line for passport control after I wonder if the guys with the guns would really shoot me if I took out my cell phone.

My idea is the confluence of several problems:

I'd like a way to have multiple versions installed without a huge @INC
Perl stops looking when it finds the first matching namespace
Many people want to install tests so they can test later too, mostly to check the tests against newer versions of dependencies
Perl 6 wants to load by name, version list, and author list.
People hate installing things.

The problems are legion and formidable:

Perl maps namespaces to filenames, unfortunately
You have to extract dependencies and fetch those distress
Pure-perl Archive::Tar is really slow.
You can't compartmentalize a namespace so you can have divergent versions of it
You have to infer from the archive name what might be inside and what version it might be

Let's ignore the first two for a moment. Those are just work, even if that work is annoying. The last one is something I've wanted for a long time. I thought I wrote about this before, but I didn't find it (I can't remember any non-stopwords I would have used).

Instead of loading modules, I'd like to load groups of modules into a variable. This wouldn't really load the distribution. It merely loads a description of the distribution which I can then play with as some sort of loader object:



my $lwp4 = load( 'libwww-perl', '< 5.0', 'GAAS' );   # distro name stub

my $lwp5 = load( 'libwww-perl', '>= 5.0', 'GAAS' ); # co-exists nicely

my %namespaces = $lwp4->list_namespaces;  # names and versions

my %dependencies = $lwp4->list_dependencies;

my $file = $lwp4->extract_file( $file );

my $test_result = $lwp4->run_tests;

 # and many other accessors

my $ua = $lwp4->use( 'LWP::UserAgent', @import_list );

That requires a lot of extra bookkeeping to put things in different, but user-hidden namespaces.

But, let's say someone figures out that part. It might be really easy to do something like that if everything (and absolutely everything) was a Moose class so Moose can move all the class names around.

The next bits are the mechanics. You want to extract the files from the distribution as you need them. That's not terribly hard, although tar is probably not a good way to do it because you'll want random access and you want to know where all the files are without having to scan the entire archive. Sure, you can kludge some pre-indexing, but let's pretend that we could convert all of CPAN to a different archive format.

So, all of that is solved. Now you get to deal with all of the crappy code in CPAN. These modules use package variables because, probably rightly so, a module gets to do what it likes with its package. But, package variables don't work anymore because packages don't exist in the same way.

And then...well, that's as far as I've thought about this.

15 comments

15 Comments

Neil Bowers | February 14, 2012 2:10 PM | Reply

I've wanted something related to this quite a few times recently. I'd like to have multiple copies / versions of distributions around, and instantiate the different versions so I can run bake-offs, etc.

gizmo_mathboy | February 14, 2012 2:45 PM | Reply

brian,

what about using .iso files or something like sqashfs (i didn't see anything in the CPAN regarding squashfs)?

I have no idea if .iso is better than tar or not. I'm just spitballing.

A quick google gave:

https://lwn.net/Articles/219827/

Interesting overview of compress file systems at least.

Awesome idea you have though.

gizmo

Erez Schatz | February 14, 2012 3:57 PM | Reply

It's mostly a matter of an ecosystem. If you want Java-like installation, you'll need a Java-like runtime, with its 2,000 classes and whatnot, and almost everything else being reimplemented by the program creators. Same with something like Python, where they reinvent the wheel so often they could build a train where no wheel is the same.
If you want Perl's "90% of every program has already been written" concept of reuse, you'll have to throw away anything like the Java ecosystem.

Dmitry Karasik | February 14, 2012 5:58 PM | Reply

I've deployed standalone perl apps and missed the feature greatly back then. It was rather annoying and tedious to hand-roll a self-unpacking perl directory with all modules and what not. I've tried PAR, it can make a one-contains-all exe file, but it still unzips all files in temp dir, not cool.

While a zip supoprt could be too hard to implement, an iso or an uncompressed tar could be easier to hack, possibly even more so with mmap(2).

James FitzGibbon | February 14, 2012 6:25 PM | Reply

I'm working on something like this right now.

The basic idea is to create PARs of whatever distributions you need (my system does build-time dependency checking, as it is designed to run on a system unconnected to the public internet) and drop those in $sitelib/pars (with the appropriate arch and perlversion hierarchy).

When building perl, I use -Dusesitecustomize, which causes perl to run $sitelib/sitecustomize.pl at startup.

This contains something like

use Config;
use PAR { repository => "file://${Config::sitelib}/pars@ };

I also build PAR, PAR::Dist, PAR::Repository, PAR::Repository::Client, etc. using -Dextras as part of my build.

I haven't finished testing (I was literally sitting down to start this pass of development when I saw your post), but the theory works well enough in that I've done it without sitecustomize and the "use PAR" line in my main script.

The issue I have right now is multiple directories of PARs.

If you have a module you're going to use regularly, throw it in $sitelib/pars and it will be picked up every time. Ideally I'd like to be able to have app-specific PARs loaded from some other dir by having the app also say:

use PAR { repository => "$FindBin::Bin/..lib/pars" };

or something similar. But PAR only supports one implicit PAR::Repository::Client object, so the second one would override the first and block access to the global PARs.

I think I may need to create a multi-par-repo proxy object that can stand in for PAR::Repository::Client but check in multiple repositories.

James FitzGibbon replied to comment from James FitzGibbon | February 14, 2012 6:31 PM | Reply

Actually, looking at the code of PAR.pm, it looks like subsequent calls to

use PAR { repository => ... };

push the constructed object into an array of repo clients that are used, so having a global and per-app repository should work fine.

Ron Savage | February 14, 2012 9:53 PM | Reply

Hi Folks

I know it's lateral-thinking type pain, but perhaps some people would be happy if they could install multiple Perls, /all the same version/, in different dirs, and then install the packages in parallel, so to speak. perlbrew doesn't seem to support a dest-dir option...

Unless of course you really do want $lwp4 and $lwp5 in the same program.

Cheers
Ron

preaction replied to comment from Ron Savage | February 15, 2012 3:08 AM | Reply

You can name the Perls created by perlbrew, and that would change the directory they're in. I did this to get a threaded, non-threaded, and 32-bit/64-bit variations of the perl 5.12.1 that we use at work.

You can also, iirc, specify that a single directory of @INC be shared between them.

What Brian wants sounds like Python eggs, but what's the benefit over what local::lib could do?

brian d foy replied to comment from Erez Schatz | February 15, 2012 9:11 AM | Reply

I don't think we need a Java like runtime for any of this.

brian d foy replied to comment from James FitzGibbon | February 15, 2012 9:12 AM | Reply

This isn't anything like PAR, which still has all of the problems I'm trying to avoid.

brian d foy replied to comment from preaction | February 15, 2012 9:13 AM | Reply

local::lib has all of the same problems as regular module installations, as I noted in the first bulleted list of the post. local::lib has all of these problems because it is exactly a regular module installation. There is no magic there.

brian d foy replied to comment from Ron Savage | February 15, 2012 9:15 AM | Reply

I really do want to load different module versions in the same program. Consider this scenario: Module A works with Module B 1.23, but Module C needs Module B 2.0. Module A doesn't work with Module B 2.0.

If I could load multiple versions, each module could have exactly the dependency they need without having to share with another module.

Hercynium | February 15, 2012 3:03 PM | Reply

I am curious - because I know that many of you know the trick I'm about to describe... but I don't see why it's not possible to hijack '@INC' just for this functionality. One could create a directory full of the dist archives you want to load, and generate an index mapping module names and versions to dist versions. The magic @INC sub could then unpack and require/import what's necessary based on the module versions specified. Perhaps better would be to add a new use/require function where one can specify the dist version they want to load the modules from...

It wouldn't be fast, and it *certainly* would be awfully difficult (and involve awful hackery!) but I think it *could* work.

The *really* troublesome part, of course, is that when loading different versions of a module, you're going to clobber the symbol tables for that module's namespaces... that *could* be handled via some more magic, like munging the package declarations in loaded modules to indicate the dist version they belong to, and then adding a function that switches the namespaces in and out within the lexical scope or file its loaded in...

Here be dragons, and deep black magick. Certainly over my capabilities, and perhaps impossible, to do properly

I guess I answered my own question. :-)

shixilun replied to comment from Hercynium | February 17, 2012 12:31 AM | Reply

If I understand this correctly, I believe I attempted something similar to this a few years ago, with a modicum of success:

http://www.slideshare.net/shixilun/module-versioning-with-apache-1x-and-modperl-1x

Chad 'Exodist' Granum | February 28, 2012 5:56 AM | Reply

I took 30 minutes and wrote a module to hack @inc to load compressed tar archives.

https://github.com/exodist/archlib

I also uploaded it to cpan, 'archlib' though it will take a couple hours to replicate out to mirrors.

Example:

use archlib 'path/to/archive';
use Module::In::Archive;
use Another::in::Archive;

Hopefully this is a good start at a module to meat your needs. Obviously it only addresses one part of the situation, but it can be expanded.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About brian d foy

I'm the author of Mastering Perl, and the co-author of Learning Perl (6th Edition), Intermediate Perl, Programming Perl (4th Edition) and Effective Perl Programming (2nd Edition).

More info »

brian d foy