Misusing newSVpv

I managed to cause another set of obscure bugs by misusing newSVpv. The goal of this code is to split the RGB and the alpha (transparent) part of a PNG image, for the use of the PDF::Builder module on CPAN.

Here the SV in newSVpv is "scalar value", and the "pv" means "string". My code says this:

sv = newSVpv ("", len);

and it causes the crash on Solaris and other operating systems because the above statement is a bad idea. What happens is that Perl copies "len" bytes from my string, where "len" might be a very large number, for a large PNG image, but I've given it the string "" to copy from. So Perl tries to copy from uninitialised, or possibly even inaccessible, parts of the computer's memory. I found out my error only after almost giving up, by using valgrind to look for memory errors.

The correct version of this is

 sv = newSV (sv_len);
 SvPOK_on (sv);
 SvCUR_set (sv, sv_len);

We make a new SV, then we tell Perl that our new SV is meant to be a string using SvPOK_on, then we tell Perl the length of our string is sv_len with SvCUR_set.

Here is the manual page from Perl 5.32 (perldoc perlapi) for newSVpv:

**newSVpv** Creates a new SV and copies a string (which may contain "NUL" ("\0") characters) into it. The reference count for the SV is set to 1. If "len" is zero, Perl will compute the length using "strlen()", (which means if you use this option, that "s" can't have embedded "NUL" characters and has to have a terminating "NUL" byte).

It doesn't really say that it is going to copy len bytes of the string, but I suppose that is obvious in retrospect since it does say that the string may contain NUL characters, so the only test it could be making for the end of the string would be the number of bytes.

This function can cause reliability issues if you are likely to pass in empty strings that are not null terminated, because it will run strlen on the string and potentially run past valid memory.

This seems to be warning about what happened to me in Image::PNG::Libpng version 0.55, but what is an "empty string" that is "not null terminated"? An empty string to me is "", which is automatically NUL terminated. `

Using "newSVpvn" is a safer alternative for non "NUL" terminated strings. For string literals use "newSVpvs" instead. This function will work fine for "NUL" terminated strings, but if you want to avoid the if statement on whether to call "strlen" use "newSVpvn" instead (calling "strlen" yourself). SV* newSVpv(const char *const s, const STRLEN len)

I don't really see why that would be safer. Here is the documentation for newSVpvn:

newSVpvn Creates a new SV and copies a string into it, which may contain "NUL" characters ("\0") and other binary data. The reference count for the SV is set to 1. Note that if "len" is zero, Perl will create a zero length (Perl) string. You are responsible for ensuring that the source buffer is at least "len" bytes long. If the "buffer" argument is NULL the new SV will be undefined. SV* newSVpvn(const char *const buffer, const STRLEN len)

It seems to be the same thing except that it doesn't look for the NUL (the zero byte) but insists on having the string length, so the only time it would be "safer" would be if len was accidentally set to zero. So what it meant by "an empty string which is not null (NUL) terminated" was the case where the user put the length as zero, but also passed a pointer which is not NULL, but points to some random place in memory where there is no NUL (zero, '\0') byte to be found, and then newSVpv went on a wild goose chase over random bytes of memory looking for the end of the string. I wonder if that has ever actually happened to anyone, or was it just a possibility that the authors were worried about?

Anyway that doesn't cover dummies like me who send in NUL-terminated strings, but lie about their length to try to get extra memory allocated, but I may have been the first person ever to abuse newSVpv in this way.

Also, wouldn't it be clearer to say that it creates a new SV of length len+1 and copies len bytes of a string into it, if the string (buffer) is not NULL, or no bytes if it is NULL.

It would also be nice if it could have been explained a bit better what to do with the return value of newSV. My initial problem, the reason I ended up abusing newSVpv, was because I couldn't work out how to tell Perl that the thing I'd created with newSV was a string. The answer to this turned out to be SvPOK_on, which is documented, but I found in fact by rooting through the Perl source code trying to work out what SvPOK was checking for. The documentation of newSV in perlguts says this about it:

In the unlikely case of a SV requiring more complex initialization, you can create an empty SV with newSV(len). If len is 0 an empty SV of type NULL is returned, else an SV of type PV is returned with len + 1 (for the NUL) bytes of storage allocated, accessible via SvPVX. In both cases the SV has the undef value.

I don't think my case is particularly unlikely, it seems fairly normal to want to get some memory to write into before actually retrieving the "string" to write, for example in my case I am calling a libpng routine to get the "string" to write into the buffer, so I want to allocate the buffer before retrieving. If I allocate the memory myself then use newSVpv on my allocated memory, there is an unnecessary copy of data, so it seems sensible to just write into the buffer of the SV.

It wasn't very clear to me what an SV of type PV with the undef value was, or how I could turn my SV with the undef value into something which Perl recognised as a string. My own preference would be for the documentation to go into more practical details of how to accomplish tasks.

2 Comments

Thanks! Well written. I hope it finds a resolution.

Leave a comment

About Ben Bullock

user-pic Perl user since about 2006, I have also released some CPAN modules.