Perl Toolchain Summit 2019 - CPAN Dependencies Graph
I was grateful to attend for the first time the Perl Toolchain Summit, held this year in Marlow, UK at the Bisham Abbey. I got to meet many of the talented and persistent contributors to the Perl CPAN infrastructure, and also see a country outside North America for the first time. The Perl Toolchain summit is a great event, made possible by the organizers and sponsors, that enables contributors of the Perl CPAN infrastructure to get together and do important work. You can see the results of my project this year at https://cpandeps.grinnz.com (example).
I decided for my first project, which took the majority of the summit, to work on a replacement for the Stratopan dependency graphs, which have unfortunately gone AWOL. I used these graphs often from MetaCPAN to visualize the dependency impact of a CPAN distribution.
First, some background information on CPAN dependencies. A CPAN distribution, a set of modules uploaded together in one release, specifies its dependencies on modules (not on distributions). This allows you to specify what modules you use, and not care about what distribution they may be provided by, which is especially useful if they change distributions later. This is also important because CPAN permissions and indexing are maintained by module name; you don't want to depend on something and later have a rogue upload take precedence when people try to install your dependencies. Dependencies are retrieved the same way as when you initially try to install a module with a CPAN client: the package index is used to determine what distribution release provides the latest version of that module, and that tarball is downloaded and installed (with its dependencies, and so on). Thus, a dependency graph would show the set of distributions that are ultimately required to install and use a distribution, via these intermediary module dependencies.
With that design in mind, on the first day of the summit, I set up the caching layer. As in my other projects of a similar web nature, I took the approach of building a Mojolicious::Lite script and tacking on commands, plugins, templates, and static files as needed. I would want a cron job to easily be able to pull and cache the dependency data periodically to keep it up to date, so I added a command to wrap this functionality.
On the second day of the summit, I worked on the caching functionality. I initially had been using the MetaCPAN API to retrieve both the dependencies of each distribution (which are module names), as well as the distributions that each of these modules belonged to, in order to continue populating the graph. But this led to some inaccurate data; some modules may be provided by multiple "latest" distribution releases according to MetaCPAN, but only one of them is canonical, and this ruling is made by the CPAN packages index (also known by its filename, 02packages). It turns out I already have set up an API into the packages index, but I did not want to query it for individual module names one at a time -- this would be far too much overhead. So I added functionality to the CPAN Meta Browser API to be able to retrieve the packages index data for multiple modules in one query, and set up the dependency caching to use that API to retrieve distribution names.
One more problem became apparent when many core modules starting showing up in my dependency trees. The Stratopan graphs hid core modules, and it was a good idea, because you wouldn't need to install these, as long as the version of Perl you were using provided a new enough version to satisfy the dependency. But I wanted to allow the user to specify what version of Perl they were using and make the judgment based on that, so I needed to amend the caching to also store each module and version dependency individually along with the distribution they mapped to. Then on the request side, I used Module::CoreList to determine which dependencies could be excluded from display.
I also tackled the fixed div problem: because the graph needed a fixed space, and the Bootstrap form would not necessarily be a fixed size, I used a viewport-percentage width and height to maximize the space the graph would render in, giving it 85% of the viewport height on load, and wrapping the form in a div using the rest of the height, allowing it to create a scrollbar if it overflows in smaller viewports. It is not a perfect layout solution, but it is good enough for now.
Finally, although I had run the command to go through and cache all distributions currently known to MetaCPAN, which took some hours, continuous updates would be needed so the graphs would show correct data as things change on CPAN. With help from Mickey, the author of MetaCPAN::Client who was conveniently sitting nearby, I added an option to the caching command to cache all releases since a certain time, and set up a cron job to cache new releases every 3 hours.
By the last day of the summit, I was mostly satisfied with the status of the CPAN Dependencies Graph site, which is now live at https://cpandeps.grinnz.com. I added a cron job that caches random distributions each day with the hope that it would eventually take care of situations where a dependency module changes distributions, though I may need a better solution for this. I also quickly added a feature where entering a name containing a double-colon (which would clearly be a module name) will redirect to the graph for the distribution providing that module. It may be that the project could be moved to MetaCPAN infrastructure in the future for increased reliability, but it is rather resource light and so my VPS is more than capable of hosting it for now.
I started work on another project I had considered for the summit: refactoring perldoc.pl's rendering to allow it to cache the HTML rendered from POD, so it does not have to be rendered on demand for each request. This is a problem for large pages like perltoc which can take an excessively long time to render from POD, and simply unnecessary work since each Perl version's POD will not change. I managed to organize the code by the end of the day such that I will be able to hook in for both storing the rendered HTML, and retrieving it to stuff into the template when requested.
I am very glad for the opportunity for myself and others to work on the toolchain of Perl in this focused environment. The event ran quite smoothly thanks to the efforts of the organizers (Neil Bowers, Philippe Bruhat, and Laurent Boivin), and would not be possible without the sponsors.