Always use `const char *` to refer to the return value from SvPV

Always use const char * to refer to the return value from SvPV.

Yesterday I got a bug report from a user via Github about Text::Fuzzy.

The bug report described that in some cases, when the user searched for an edit distance with Unicode strings, the user's input value, $string in the following, seemed to be being overwritten and corrupted:

$tf->distance ($string);

I couldn't reproduce the user's bug using the script he supplied, but just in case, I went through the code and tried to find anywhere that a string might be being overwritten, by adding const in front of every char * pointer which was used to store a Perl string.

This led me to this line where the value corresponding to $string in the above is read using SvPV, and this line where the value pointed to is overwritten by the code. This is a special case which only executes when the user matches a byte string against a character string.

As a fix for the bug, I changed to using allocated memory after the test for Unicode, and added a field allocated to the tf->b and set it to true or false so that the allocated memory could be freed. In a later commit I also added a test that the bug was fixed.

However, it would have been better if I had never allocated the return value from SvPV into a char * but always used a const char *.

According to Ken Thompson,

Const only confuses library interfaces with the hope of catching some rare errors

(source) but I'm not sure I agree with him.

The not-so-great escape

Escaping HTML is the process of converting a user's input into something which can be displayed back to the user in a web browser. For example, in a comment section on a blog, or a wiki editable by users.

Given user input such as <script>, to display that correctly, an HTML
escaper must output &lt;script&gt;. This is then converted into
<script> rather than an actual HTML script tag by the browser:


But supposing the user inputs &lt;script&gt;, what should be done with it?

\d does not validate numbers

points us to this Perl FAQ:

Unfortunately, the regular expression part of the above FAQ page is wrong. \d doesn't validate numbers, unless you have already verified that your input contains only ASCII characters.

What \d does is to validate whether a number is regarded as a numeral in Unicode. For example, \d will happily match things like U+07C2: '߂' NKO …