\d does not validate numbers

http://stackoverflow.com/questions/43814055/easy-to-check-if-user-input-is-a-number-in-perl

points us to this Perl FAQ:

http://perldoc.perl.org/perlfaq4.html#How-do-I-determine-whether-a-scalar-is-a-number%2fwhole%2finteger%2ffloat%3f

Unfortunately, the regular expression part of the above FAQ page is wrong. \d doesn't validate numbers, unless you have already verified that your input contains only ASCII characters.

What \d does is to validate whether a number is regarded as a numeral in Unicode. For example, \d will happily match things like U+07C2: '߂' NKO DIGIT TWO, or 096F: '९' DEVANAGARI DIGIT NINE, and 360 other characters which are not valid as numerals. If you need to use a regular expression to validate whether something is a number, use [0-9] to match digits, not \d.

For more examples of what characters match \d, please see

https://www.lemoda.net/perl/what-matches/what-matches.cgi?regex=\d

The source code for the script is here:

https://www.lemoda.net/perl/what-matches/index.html

The reason I'm aware of these defects in the use of \d for validating numbers is because of having used it to validate user input at the following web pages:

http://www.sljfaq.org/cgi/numbers.cgi

Before I removed \d everywhere a few years ago, it was not uncommon to unravel bugs resulting from people typing in Devangari or other non-ASCII numerals which had been validated using \d.

7 Comments

That’s what /a is for.

Note that /a requires Perl 5.13.10 or higher.

Then [:digit:] is /d and also not [0-9]

I'm using [0-9].

\d contains Unicode number.

DEVANAGARI DIGIT NINE, for example, is used by millions of people millions, perhaps billions, of times a day as an essential component of their numbers. I don't know if you are being careless with your terminology, or wrongly arrogant about the place in the universe of [0-9].

Unicode::UCD::num(), since Perl 5.14, can be used to make sure that a string of digits are all from the same script, so are not spoofing attempts, returning the numeric value the string represents, or undef if it is illegal.

Leave a comment

About Ben Bullock

user-pic I blog about Perl.