Stupid Regexp Trick: Fail on match
I found myself one day trying to come up with a regexp that matched numbers more or less the way Perl does. My first cut was
m/ \G \s* (?: (?<oct> 0b[01]+ | 0[0-7]+ | 0x[[:xdigit:]]+ ) | (?<float> [+-]?(?=\.?\d)\d*\.?\d*(?:e[+-]?\d+)? ) ) (?! \w ) /smxigc
where the <oct>
capture represents things that
need to be run through the oct
built-in, and the
<float>
capture comes from perlfaq4.
The problem here was that the <float>
expression
matched things like '09'
, which was not what I wanted. What
I wanted was to have the entire expression fail if it got past the
<oct>
expression and found something beginning with
'0'
, other than '0'
itself.
After considerable head-scratching, I found an answer in
perlre, under Special Backtracking Control
Verbs. (*FAIL)
was almost what I wanted, but not
quite; it just forces backtracking. But (*COMMIT)
commits
the regexp to the branch it appears on, and as the example in the docs
makes clear the combination of the two does the job. The regexp I ended
up with was
m/ \G \s* (?: (?<oct> 0b[01]+ | 0[0-7]+ | 0x[[:xdigit:]]+ ) | 0\w (*COMMIT) (*FAIL) | (?<float> [+-]?(?=\.?\d)\d*\.?\d*(?:e[+-]?\d+)? ) ) (?! \w ) /smxigc
I am not quite sure where the "Stupid ... Trick" titles came from. It seems to me that they tend to represent techniques just very slightly off the beaten track, and the stupidity is probably my own for not figuring them out a lot sooner.
I've seen the trick with (*SKIP) instead of (*COMMIT), it doesn't seem to change the behaviour here. Also, note that Perl accepts underscores in numbers:
Is there any reason you could not make use of
Scalar::Util::looks_like_number
?My stupid trick does not in fact accommodate underscores in numbers like Perl does, but I believe it could could easily be made to do so. I did not do this, frankly, because I did not think about it.
There are several reasons I did not use
looks_like_number()
:Inf
andNaN
;'2'
and'3'
in the string'2+3'
, butlooks_like_number()
will not do this.