Repeated Capturing and Parsing
An interesting query was recently posted to the internal Perl mail list at work. The questioner was trying to match a pattern repeatedly, capturing all of the results in an array. But, it wasn't doing quite what he expected. The message, with minor edits, went a little something like the following.
I'm trying to extract key/value pairs from a file with the following contents:- name = gcc_xo_src_clk, type = rcg + name = cxo_clk, type = xo, fgroup = xo, wt = 10, bloo = blah ? type = hm_mnd_rcg, name = bo : type = rcg_mn + name = pxo_clkI was hoping to do something like this:
@list = $_ =~ m{ ^[-+?] \s* (\S+) \s* = \s* (\S+) \s* (?:, \s* (\S+) \s* = \s* (\S+) \s*)* }xms;Thinking @list would be assigned the alternating key/value pairs. But the above doesn't extract anything sane. Adding the /gc modifiers doesn't make any difference.
If I do the following, it extracts the first two key/value pairs correctly (if the line has more than one pair).
@list = $_ =~ m{ ^[-+?] \s* (\S+) \s* = \s* (\S+) \s* , \s* (\S+) \s* = \s* (\S+) \s* }xms;If I keep repeating the pattern in the second line, it keeps matching more key/value pairs.
I would expect using (?: )* should mean zero or more instances of match inside the parentheses, but obviously it's not working. What am I doing wrong?
When I'm presented with a problem like this, that is some kind of structured data, I immediately think of writing a parser. I'll get back to that in a bit, but I wanted to address the confusion about capturing in the pattern. And, in fact, that's how the discussion on the mail list proceeded.