Rethinking smart matching

Ricardo wants to fix smart matching, which is horribly broken and always has been, although we're just starting to realize how bad it really is. He reduces the table to just a few operations:

$a      $b              Meaning
======= =======         ======================
Any     undef           ! defined $a
Any     ~~-overloaded   invokes the ~~ overload on the object, $a as arg
Any     Regexp, qr-OL   $a =~ $b
Any     CodeRef         $b->($a)
Any     Any             fatal

It's just as easy to not use the smart match operator in these cases, and it removes the cases that genuinely could have made conditionals easier. It also leaves behind the idea that a smart match can decide on its own how to get the answer. For instance, a smart match might short-circuit, might hand off the work to another machine, or parallelize it. That is, most of the power of smart matching is in container types. With Ricardo's table, I'd rather see smart matching completely removed.

I'd like to bring back some of what we left behind with Perl 5.10.1, and eliminate the cases that are causing all of the problems. My vision of smart matching is one that simplifies common questions we like to ask in a conditional. As such, if a smart match wouldn't simplify the conditional, I want to remove that feature.

  • Remove the numish stuff. Let Perl compare things as strings in all cases. If you want to compare numbers, don't use smart matching.
  • Don't smart match recursively. Only the top level smart matches.
  • Bring back commutativity.
  • Remove Object and CodeRef features. Use methods for that.

In my fantasy table, these types refer to the low-level types, although you can still use references. An array reference still counts as Array, a hash reference as a hash, and so on. The order of operands does not matter:

$a      $b              Meaning
======= =======         ======================
Scalar  Regexp          Scalar matches Regexp
Scalar  Array           Scalar is an element of Array
Scalar  Hash            Scalar is a key of Hash
Scalar  Scalar          The string values are the same
Array   Regexp          At least on element of Array matches Regex
Array   Array           Arrays are the same. Reference elements point to the same data.
Array   Hash            Every element of Array is a key in Hash
Hash   Regexp           At least on key of Hash matches Regex
Hash   Hash             Hashes are the same. Reference elements point to the same data.
                        everything else is an error, as soon as Perl reduces it

For example, with scalar and array operands, the current behavior distributes the ~~ over all elements:

  if( $scalar ~~ $array ) { ... }

  if( $scalar ~~ $array[0] or $scalar ~~ $array[1] ... $scalar ~~ $array[N] ) { ... }

This leads to some odd behavior is an element of @array is a reference since Perl must then go back to the chart to see what to do. Instead, I want to see those turned into string comparisons:

  if( $scalar ~~ $array ) { ... }

  if( $scalar eq $array[0] or $scalar eq $array[1] ... $scalar or $array[N] ) { ... }

A possible short circuit is much better than our alternatives:

  if( grep { $scalar eq $array[$_] } @array ) { ... }

  use List::Util qw(first);
  if( first { $scalar eq $array[$_] } @array ) { ... }

But, we might not have to choose one way. People are looking at removing smart matching from the core to put it into a pragma, possibly lexical, which allows everyone to be able to write their own smart match engine (although I predict virtually no one will).

I also realize it's basically too late for any changes. My fundamental goal is to not confuse people. Changing smart matching to the best feature it could ever hope to be is going to do that, no matter how good it is. We already have two versions. We don't need a third. It might be time for this to go the way of pseudohashes.


Ricardo's table takes away the one feature of Smart Match that I actually liked, which is the emulation of Python's "in" keyword.

I propose a table like this...

$a      $b              Meaning
======= =======         ======================
Scalar  Array           Scalar is an element of Array
Scalar  Scalar          The string values are the same
Array   Array           Arrays are the same. 

That's all you really need, right?

Regular expression uses of ~~ look identical to =~ to me.

Hash-related smart matches all seem to be reducible to Array ones from my table with appropriate use of keys() or values(). Plus that way it's explicit whether you are checking the keys or the values, where before you had to remember which one the smart match did.

Coderefs, particularly coderefs distributed over arrays, are complicated enough that you should write a grep or a for loop in that case for the sake or readability.

- Alex

As I had nearly no place to be able to use 5.10 in production I never used smart matching but when I looked at it and when I was trying to explain it I thought the most natural (or only?) place for its use will be the given/when statement.

That means the left hand side is always a scalar.
The right hand side can be all kinds of things.

Is there really any other place where the cleverness of the smart match isn't too clever for a reasonable developer to understand?


That means the left hand side is always a scalar.
The right hand side can be all kinds of things.

No, an arrayref is considered an array, etcetera. Using given/when will still expose you to the full complexity of smartmatching.

Nobody yet mentioned: I, for one, really like this more explicit approach.

I think this is just about the worst possible list you could come up with. Removing Object and CodeRef features is a big part of that; closed-ended system is not worth using.

The fact that the matches that you keep in are completely arbitrary only makes it worse. A good example is your claim that with scalar and array operands, the current behavior distributes the ~~ over all elements. That is only one of five things that may happen if you compare a scalar with an array. If it is an arrayref, hashref, undefined or regexp, something very different will happen. That is the problem: smartmatch is unpredictable in every way. Currently, overloaded object are the only reliable (except for a recently found bug) way to do anything complex.
looks interesting actually, while methods like
might make even more sense, as I would expect string_length to return the length of a string, e.g.

I do like the idea of having this in a separate module like Smart::Match (as mentioned about), possibly with explicit method calls as alternatives. Re: deprecation (or, going the way of pseudohash 'dodo'): how would this affect the given/when switch, considering when is defined in terms of the smart match operator? Would hate to lose that...

My use of smart match is $str ~~ ['str1','str3',qr/str3/,...]. It is very useful for this. And it is hard to rewrite because it is not possible to easily check if string is regex or not.

chorny - I have the exact same use case :)

I think removing the recursion is probably not a good idea. It isn't clear to me that doing so would actually simplify things from the user's perspective or fix any of the actual problems that smart-ish matching has right now. However, I do agree that Ricardo's proposed semantics would make the operator pointless. So, I'm glad to see a more thoughtful approach to fixing this problem.

Oh, right there could be a ref in the given() variable. Anyway, I still don't see uses of ~~ outside of given/when but as I wrote I hardly ever used 5.10 or newer due to client limitations.

You say "time to go the way of pseduohashes." Pseudohashes were removed because they slowed down Perl not because they made the spec too hard for a few simple minds to codify.

Leave a comment

About brian d foy

user-pic I'm the author of Mastering Perl, and the co-author of Learning Perl (6th Edition), Intermediate Perl, Programming Perl (4th Edition) and Effective Perl Programming (2nd Edition).