Huh. Multiple beginning-of-line anchors work

By brian d foy on April 16, 2015 1:30 AM

I've never had a reason to use multiple beginning-of-line anchors in a regex, so I wondered if it would work. I guess it does.

use v5.10;

my $string = <<'HERE';

This is line one

This is a cat

This is a dog

 This is a lizard

This is a bird

That is a ostrich

HERE

my @matches = $string =~ m/

	^This\ is\ a\ (\S+) \s+

	^This\ is\ a\ (\S+)

	/xmg;

	

say "Matches are @matches";

It works:

Matches are cat dog

Not that you'd ever want to do this. I was curious, I tried it, and now I know. That probably means I'm going to try to get it into production somehow.

6 comments

6 Comments

https://www.google.com/accounts/o8/id?id=AItOawmEFmsULh5l4dO4ek5K2NQ-8wcJG99nGOA | April 16, 2015 4:53 AM | Reply

It seems that a lot of people just uses regex modifier without even knowing what they meant. You use "/xmg" as a modifier. The /m modifier means:

From the perlre documentation:

> Treat string as multiple lines. That is, change
> "^" and "$" from matching the start of the
> string's first line and the end of its last line
> to matching the start and end of each line within
> the string.

So sure, it does work on multiple lines, you have enabled that feature explicitly.

A. Sinan Unur | April 16, 2015 6:14 AM | Reply

I don't think you understand what brian is demonstrating is not obvious based on just what the documentation you quoted says. Most people do not automatically think of using multiple ^ anchors to match subsequent lines.

The ^ and $ anchors are generally used as in the simplified example of:

my @matches = $string =~ m/
	^ This\ is\ (\S.+) $
	/xm;

with brian's example will result in:

Matches are line one a cat a dog a bird

which corresponds to your "So sure, it does work on multiple lines."

brian uses the two ^ anchors to select substrings from two subsequent lines. There is only one pair of lines that match brian's criterion so it prints:

Matches are cat dog

Now, this wasn't surprising to me as I have used this feature before to match meta information in the headers of certain data files, but it is not obvious simply based on the documentation that you can refer to two or more subsequent lines in one go using multiple ^ anchors.

A. Sinan Unur replied to comment from A. Sinan Unur | April 16, 2015 6:40 AM | Reply

I messed up the pattern in my example (too early here), but the point still holds:

my @matches = $string =~ m/
	^This\ is\ a\ (\S+)
    /xmg;

will print

Matches are cat dog bird

as opposed to brian's example which only matches in case of two subsequent lines that match the pattern.

https://www.google.com/accounts/o8/id?id=AItOawmEFmsULh5l4dO4ek5K2NQ-8wcJG99nGOA replied to comment from A. Sinan Unur | April 16, 2015 7:43 AM | Reply

> I don't think you understand what brian is demonstrating is not obvious based on just what the documentation you quoted says. Most people do not automatically think of using multiple ^ anchors to match subsequent lines.

Sure, but it has exactly to do with what i quoted. People don't think in that way, because without the /m flag that is the case.

Without /m the "^" means "beginning of string". And sure, if it means the beginning of the string you can't match multiple lines. It also makes no sense to use ^ twice in a regex because there never exists multiple "string-beginnings", there is just one.

But the whole thing changes when you use the /m modifier. When you use it the ^ changes to "Line beginning" instead of "string beginning". And because a string can have multiple lines it also makes it clear for me that you can use it multiple times. For me that is the natural consequence out of it.

Why should there be any limitation that you can't do that? With /m the "^" is exactly an anchor like "\b" is. And it is also natural that you can use multiple \b.

A. Sinan Unur replied to comment from https://www.google.com/accounts/o8/id?id=AItOawmEFmsULh5l4dO4ek5K2NQ-8wcJG99nGOA | April 16, 2015 7:56 AM | Reply

We are in agreement. Just note that any programming language has a bazillion details. No one can remember everything, and the subsets people can constantly keep in their heads vary across time and people.

It so happens that I frequently doubt my memory and/or understanding of various details of a specific language. In those cases, just trying out alternatives, and seeing what they do is a great way to remember the detail for the next time.

Aristotle | April 16, 2015 6:16 PM | Reply

I would have been surprised if it didn’t work that way.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About brian d foy

I'm the author of Mastering Perl, and the co-author of Learning Perl (6th Edition), Intermediate Perl, Programming Perl (4th Edition) and Effective Perl Programming (2nd Edition).

More info »

brian d foy