\d does not validate numbers
http://stackoverflow.com/questions/43814055/easy-to-check-if-user-input-is-a-number-in-perl
points us to this Perl FAQ:
http://perldoc.perl.org/perlfaq4.html#How-do-I-determine-whether-a-scalar-is-a-number%2fwhole%2finteger%2ffloat%3f
Unfortunately, the regular expression part of the above FAQ page is wrong. \d doesn't validate numbers, unless you have already verified that your input contains only ASCII characters.
What \d does is to validate whether a number is regarded as a numeral in Unicode. For example, \d will happily match things like U+07C2: '߂' NKO DIGIT TWO, or 096F: '९' DEVANAGARI DIGIT NINE, and 360 other characters which are not valid as numerals. If you need to use a regular expression to validate whether something is a number, use [0-9] to match digits, not \d.
The reason I'm aware of these defects in the use of \d for validating numbers is because of having used it to validate user input at the following web pages:
http://www.sljfaq.org/cgi/numbers.cgi
Before I removed \d everywhere a few years ago, it was not uncommon to unravel bugs resulting from people typing in Devangari or other non-ASCII numerals which had been validated using \d.
(This post was edited on 4 December 2018 to remove links to some old pages which are now gone from my web site.)
That’s what
/a
is for.Note that
/a
requires Perl 5.13.10 or higher.Then [:digit:] is /d and also not [0-9]
I'm using [0-9].
\d contains Unicode number.
Yes, I switched to using [0-9] almost everywhere. I think it's simpler.
DEVANAGARI DIGIT NINE, for example, is used by millions of people millions, perhaps billions, of times a day as an essential component of their numbers. I don't know if you are being careless with your terminology, or wrongly arrogant about the place in the universe of [0-9].
Unicode::UCD::num(), since Perl 5.14, can be used to make sure that a string of digits are all from the same script, so are not spoofing attempts, returning the numeric value the string represents, or undef if it is illegal.
> That’s what /a is for.
As a followup to this article, I am thinking about making another blog post showing how \d is used to match numbers in actual CPAN modules. It's used for number validation in more than a thousand modules, for example here is the matches for /\\d\./:
http://grep.cpan.me/?q=%5C%5Cd%5C.
and here is the matches for /\\d\+\./:
http://grep.cpan.me/?q=%5C%5Cd%5C%2B%5C..*%2F%5Bb-z%5D*a
Noting your comment, I tried searching for CPAN modules which use the /a flag to restrict \d so that it only matches ASCII digits. I haven't spent very long, but so far I haven't found any.
The following searches bring up a few false positives, but no actual uses of the flag:
http://grep.cpan.me/?q=%5C%5Cd%5C%2B%5C.%5B%5E%2F%5Cn%5D*%2F%5Bb-z%5D*a
http://grep.cpan.me/?q=%5C%5Cd%5C.%5B%5E%2F%5Cn%5D*%2F%5Bb-z%5D*a
Thanks for any assistance with this.