Stupid Regexp Trick: Fail on match

I found myself one day trying to come up with a regexp that matched numbers more or less the way Perl does. My first cut was

m/ \G \s* (?:
        (?<oct> 0b[01]+ | 0[0-7]+ | 0x[[:xdigit:]]+ ) |
        (?<float> [+-]?(?=\.?\d)\d*\.?\d*(?:e[+-]?\d+)? )
    )
    (?! \w )
/smxigc

where the <oct> capture represents things that need to be run through the oct built-in, and the <float> capture comes from perlfaq4.

The problem here was that the <float> expression matched things like '09', which was not what I wanted. What I wanted was to have the entire expression fail if it got past the <oct> expression and found something beginning with '0', other than '0' itself.

After considerable head-scratching, I found an answer in perlre, under Special Backtracking Control Verbs. (*FAIL) was almost what I wanted, but not quite; it just forces backtracking. But (*COMMIT) commits the regexp to the branch it appears on, and as the example in the docs makes clear the combination of the two does the job. The regexp I ended up with was

m/ \G \s* (?:
        (?<oct> 0b[01]+ | 0[0-7]+ | 0x[[:xdigit:]]+ ) |
        0\w (*COMMIT) (*FAIL) |
        (?<float> [+-]?(?=\.?\d)\d*\.?\d*(?:e[+-]?\d+)? )
    )
    (?! \w )
/smxigc

I am not quite sure where the "Stupid ... Trick" titles came from. It seems to me that they tend to represent techniques just very slightly off the beaten track, and the stupidity is probably my own for not figuring them out a lot sooner.

3 Comments

I've seen the trick with (*SKIP) instead of (*COMMIT), it doesn't seem to change the behaviour here. Also, note that Perl accepts underscores in numbers:

say 0b0_1_0_1_0_1

Is there any reason you could not make use of Scalar::Util::looks_like_number?

Leave a comment

About Tom Wyant

user-pic I blog about Perl.