CUDA and the Perl Data Language
Yesterday I announced the release of my Perl-accessible bindings for CUDA. CUDA is marketed as a massively parallel, high-performance computing architecture. When you think about Perl and high-performance computing, I would hope that PDL, the Perl Data Language, comes to mind. :-)
PDL is a CPAN distribution that gives Perl the ability to compactly store and speedily manipulate the large N-dimensional data arrays which are the bread and butter of scientific computing. Today I will discuss how CUDA::Minimal and PDL talk with each other. (In case you're curious, tomorrow I discuss error handling in CUDA::Minimal.)
The common lingo among PDL folk is to call PDL objects by the name 'piddle'. We discuss the possibility of a name change perennially on the mailing list and I've never come up with anything better, until tonight. For this blog post, I'm going to refer to PDL objects as 'pdsets', short for Perl Data Sets.
However, if you ever visit the mailing lists, you should probably use the term 'piddle' if you need to make reference to PDL objects.
PDL's automated vectorization is powerful but it will play no role in today's post. My interest in linking PDL and CUDA::Minimal focuses on transferring data contained in pdsets to and from the device. The cool part is that you don't have to change anything from the case of using packed scalars. Just use the pdset in place of the original scalar and everything will work just fine.
For example, these three sets of code end up with the same data on the device:
my $packed_data = pack('f*', 0..24); my $input_dev_ptr = MallocFrom($packed_data); my $pdset = sequence(25); my $input_dev_ptr = MallocFrom($pdset); my $input_dev_ptr = MallocFrom(sequence(25)->float);
You can use the
Transfer function that I discussed in my last post:
my $pd_results = zeroes(float, $N_data_points); Transfer($dev_ptr => $pd_results);
In short, if you use a pdset in place of a packed scalar, it should just work.
CUDA::Minimal also installs a couple of methods in the PDL namespace:
nbytes. You can use the first two of these methods directly if you prefer:
my $pd_results = zeroes(float, $N_data_points); $pd_results->get_from($dev_ptr);
or, more compactly, using the chaining idiom common to PDL code:
my $pd_results = zeroes(float, $N_data_points)->get_from($dev_ptr);
Working with Slices
PDL provides a method for selecting subsets of your full pdset to directly manipulate. The manipulations flow back to the original pdset unless you intentionally sever the connection. Although it is not very efficient, CUDA::Minimal properly handles slices in this way.
Extending to Other Classes
So far I have focused on PDL. But when
Transfer or any other CUDA::Minimal function encounters an object (a blessed reference) as one of its arguments where it was expecting a packed scalar or an integer with the device pointer, it attempts to use the object methods discussed in the previous section. CUDA::Minimal can work with any class that supplies methods
nbytes. The details are discussed in the documentation.