The Day Perl Stood Still: Unveiling A Hidden Power Over C
Sometimes the unexpected happens and must be shared with the world … this one is such a case.
Recently, I’ve started experimenting with Perl for workflow management and high-level supervision of low level code for data science applications. A role I’d reserve for Perl in this context is that of lifecycle management of memory buffers, using the Perl application to “allocate” memory buffers and shuttle it between computing components written in C, Assembly, Fortran and the best hidden gem of the Perl world, the Perl Data Language. There at least 3 ways that Perl can be used to allocate memory buffers:
- Generate a list of bytes and use the pack function to convert them to a string.
- Use the repetition operator (x) to generate a string of length equal to the size of the buffer one wants minus one (strings are terminated via null bytes in Perl, thus the null byte compensates for the mine one).
- Access an external memory allocator library either through Inline or FFI::Platypus to allocate the buffer. When I took the three methods for a ride as detailed elsewhere, I found that allocating the buffer through a Perl string massively outperformed C’s malloc by over a 10-fold.
Not believing the massive performance gain, and thinking I am dealing with a bug in Inline::C, I recoded the allocation in pure C and obtained practically the same results as the Inline::C method.
After researching the issue further, I discovered that the malloc I grew to admire trades speed in memory allocation for generality, and there is a plethora of faster memory allocators out there. It seems that Perl is using one such allocator for its strings, and kicks C’s butt in this task of allocating buffers.
Perl uses malloc() to allocate PVs for SVs.
If you're using constants with "x" to create the PVs:
my $x = "x" x 10000; # constants
then that's done at compile-time, and assigning to another SV:
my $x = "x" x 10000; # constants
my $y = $x; # copy on write, no allocation
there's no allocation done for the PV, though the SV body will be allocated from a pool.
Also, the copy on write means that the PV of $x is shared with $y, so if you use pack "p" to supply the address of the PV for $y to a C function (or badly written XS) which writes to the pointer, it will also write over $x and over the PV of the constant SV in the CONST op used to initialize $x.