Faster PDL Development Cycle---But How?

This entry is a repost from the start of a discussion on the PDL developers mailing list in the hopes of getting wider inputs from the perl community.

PDL Developers-

With the addition of two active and highly motivated PDL developers (Zakariyya Mughal and Guggle "Ed" Worth) we've made significant progress in cleaning up the PDL distribution itself and the development process itself. PDL is now run through test builds automatically on git commit via the Travis-CI framework of github. Many perl platforms and PDL configuration options are exercised. PDL-2.013 was the best tested pre-release release ever.

The current process we've been working toward is to make PDL development faster and more responsive by breaking up the current monolithic PDL distribution into a lean core (roughly the current PDL::Core, PDL:PP, and PDL::Slices) and spinning off the other modules for IO, Graphics, and Library interfaces as their own CPAN releases. This would enable the separate module/distributions to have a faster development-test-relese cycle since that process would not be held up by the testing of the full PDL distribution with all its subcomponents, even if they are completely independent/unrelated to the separate module changes being made.

We're ready to make the split, but there is a catch... How can we have the rapid agile development needed to bring the next generation PDL3 possible without losing the "PDL just works" that has been one of the primary focus of PDL-2.x development since I volunteered as release manager circa PDL-2.4.3 [sic]?

There has been some discussion, largely on #pdl, about how to best proceed. One idea is to move to a constant release mode which could be expedited by adding co-maints to PDL. I've not acted on that largely because I feel that PDL just working, easy to get and start to use, is essential to survive as a minority numeric computation engine (compared with R, NumPy, Octave/MATLAB). How can we grow market share if it takes a perl expert to start using PDL?

That said, I think the "big split" is the best way forward for PDL to grow and thrive. The ideas for the PDL3 core engine show great promise for the kind of dynamic development as occurred when Karl first conceived and implemented the idea that would become PDL. Unfortunately, my experience with rapid sequential releases is a sort of "churn" where it is difficult to know if you'll be able to get a working module at any given release. So what to do...

One idea I had is change the stable PDL release distribution into a PDL bundle. That would be the "stable PDL" that would be easy to get and install. The sub-modules would then be able to have independent development forming the "experimental PDL" track. Another way, a bit more crude, would be to make a fixed "stable PDL" release that would be the one to install. Maybe we could use specific version information to work with cpan, cpanm,...

Here's where we need your input for discussion and consensus. Please feel free to comment on any of the above, or to offer your own thoughts. The goal is to select the preferred approach for modern PDL development and move out on it. I would like to complete this discuss process within the next two weeks. At that point we should be able to make a specific plan for any final comments with the agile development to begin shortly after.

Let the discussions begin!

2 Comments

How about ejecting all of the parts into their own modules, and leaving the main namespace (the "just works" part) as an integration layer that has requires statements in the META files for the distribution.

The CPAN tools handle the META files, install the requirements, and Bob's your uncle (well, mine anyway).

MLX

To respond to MLX's thought, albeit a little while later: that would be more or less the "bundle" option.

How it turned out: around the end of 2024, PDL 2.096 was released. In the months beforehand, various modules that had external dependencies (like GSL, OpenGL, Fortran) were released as their own CPAN distributions. This left "main PDL", with a stripped-down, but not fully minimal, set of capabilities, that could be built anywhere that had a C99 compiler (and installed with either that, or a package manager).

It is worth noting that this cut-down PDL closely resembles the PDL you'd have got with the previous kitchen-sink "everything PDL" if none of the external dependencies were present. Those PDL parts that have external dependencies can now hard-require those, which makes troubleshooting specific problems possible; I remember trying to get someone working with the PGPLOT- or OpenGL-needing modules, and having to ask them to install the dependencies, then try installing the whole of "everything PDL" in the hope that this time it would build the extra stuff. No more. Now you can run:

cpanm PDL::Graphics::TriD

And if it works, you will have cool 3D visualisations at your fingertips.

Another benefit of 2.096 is it had some innovations to make the Makefile.PL much easier to understand, which helps packagers. https://repology.org/project/perl:pdl/packages shows the latest PDL (2.103) packaged on the latest Debian, AUR, Fedora, Kali, openmamba, and quite recent on Ubuntu, FreeBSD, MacPorts, and Gentoo.

Leave a comment

About Chris Marshall

user-pic I blog about Perl and PDL.