<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Tom Wyant</title>
    <link rel="alternate" type="text/html" href="https://blogs.perl.org/users/tom_wyant/" />
    <link rel="self" type="application/atom+xml" href="https://blogs.perl.org/users/tom_wyant/atom.xml" />
    <id>tag:blogs.perl.org,2009-11-03:/users/tom_wyant//506</id>
    <updated>2023-02-24T22:24:11Z</updated>
    <subtitle>A blog about the Perl programming language</subtitle>
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 4.38</generator>

<entry>
    <title>Ordering Your Tests</title>
    <link rel="alternate" type="text/html" href="https://blogs.perl.org/users/tom_wyant/2023/02/ordering-your-tests.html" />
    <id>tag:blogs.perl.org,2023:/users/tom_wyant//506.11036</id>

    <published>2023-02-24T22:20:14Z</published>
    <updated>2023-02-24T22:24:11Z</updated>

    <summary>By default, the test actions of both ExtUtils::MakeMaker and Module::Build test t/*.t in lexicographic order (a.k.a. ASCIIbetical order). Under this default, some Perl module authors who want tests performed in a given order have resorted to numbering tests: t/01_basic.t, t/10_functional.t,...</summary>
    <author>
        <name>Tom Wyant</name>
        
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="https://blogs.perl.org/users/tom_wyant/">
        <![CDATA[<p>By default, the <code>test</code> actions of both <a href="https://metacpan.org/pod/ExtUtils::MakeMaker"><code>ExtUtils::MakeMaker</code></a> and <a href="https://metacpan.org/pod/Module::Build"><code>Module::Build</code></a> test <code>t/*.t</code> in lexicographic order (a.k.a. ASCIIbetical order). Under this default, some Perl module authors who want tests performed in a given order have resorted to numbering tests: <code>t/01_basic.t</code>, <code>t/10_functional.t</code>, and so on.</p>

<p>My personal preference is to take the lexicographic ordering into consideration when naming test files: <code>t/basic.t</code> through <code>t/whole_thing.t</code>. But the price of this choice is a certain number of contrived test names, and even the occasional thesaurus lookup.</p>

<p>But there is a better way. Both <code>ExtUtils::MakeMaker</code> and <code>Module::Build</code> allow you to specify tests explicitly.</p>

<p>Under <code>ExtUtils::MakeMaker</code> version 6.76 or above, you call <code>WriteMakeFile()</code> thus:</p>

<pre>
WriteMakeFile(
    ...
    test =&gt; {
        TESTS =&gt; 't/one.t t/two.t t/three.t t/four.t',
    },
    ...
);
</pre>

<p>If you do this, the tests specified (and <strong>only</strong> the tests specified) are performed in the order specified.</p>

<p><code>ExtUtils::MakeMaker</code> version 6.76 was released September 5 2013 and shipped with Perl 5.19.4, so any reasonably modern Perl should support this.</p>

<p>The equivalent incantation under <code>Module::Build</code> version 0.23 or above is:</p>

<pre>
Module::Build-&gt;new(
    ...
    test_files =&gt; [ qw{
        t/one.t
        t/two.t
        t/three.t
        t/four.t
        } ],
    ...
)-&gt;create_build_script();
</pre>

<p><code>Module::Build</code> version 0.23 was released February 9 2004.</p>
]]>
        
    </content>
</entry>

<entry>
    <title>Outstanding GitHub Items</title>
    <link rel="alternate" type="text/html" href="https://blogs.perl.org/users/tom_wyant/2023/02/outstanding-github-items.html" />
    <id>tag:blogs.perl.org,2023:/users/tom_wyant//506.11029</id>

    <published>2023-02-17T02:30:45Z</published>
    <updated>2023-02-17T02:32:12Z</updated>

    <summary>Recently I received a bump on a GitHub pull request. This surprised me, because I was unaware of anything outstanding. I was even more surprised when I discovered that the distribution in question also had two open issues, one dating...</summary>
    <author>
        <name>Tom Wyant</name>
        
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="https://blogs.perl.org/users/tom_wyant/">
        <![CDATA[<p>Recently I received a bump on a GitHub pull request. This surprised me, because I was unaware of anything outstanding. I was even more surprised when I discovered that the distribution in question also had two open issues, one dating back about three months.</p>

<p>I have no idea why I was oblivious to these, but it made me want to audit myself to see if any other distributions had the same problem. GitHub has these nice links at the top of the page, <cite>Pull requests</cite> and <cite>Issues</cite>, but these show pull requests and issues that I initiated. I found no obvious way to display pull requests or issues filed against my repositories.</p>

<p>Now, maybe it is just me, but I find GitHub's documentation moderately opaque. But with considerable help from Duck Duck Go, I discovered the answer: you type into the search box <code>is:open user:&lt;your GitHub user name&gt;</code>. This gets you both open issues and open pull requests. If you want, you can restrict this further with <code>is:issue</code> (for issues) or <code>is:pr</code> (for pull requests). Do <strong>not</strong> leave off the user name, even if you are logged in. If you forget this you will get every open item on GitHub -- all 102 million of them as of this writing.</p>

<p>Now I am lazy, so I made a browser shortcut to do this for me. I don't think you get private repositories this way, but I was not worried about that. The string will have to be URI-escaped. So now if I want to audit myself I just click on <a href="https://github.com/search?q=user%3Atrwyant+is%3Aopen">https://github.com/search?q=user%3Atrwyant+is%3Aopen</a> and see what I get.</p>
]]>
        
    </content>
</entry>

<entry>
    <title>Annotated Test2::Tools Index</title>
    <link rel="alternate" type="text/html" href="https://blogs.perl.org/users/tom_wyant/2023/02/annotated-test2tools-index.html" />
    <id>tag:blogs.perl.org,2023:/users/tom_wyant//506.11018</id>

    <published>2023-02-03T02:07:20Z</published>
    <updated>2023-02-03T02:09:21Z</updated>

    <summary>I have very gradually been adopting Test2::V0 as a testing tool. I had a test file that performed a group of tests inside a for loop, and discovered there were circumstances where I wanted to skip an iteration. Well, the...</summary>
    <author>
        <name>Tom Wyant</name>
        
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="https://blogs.perl.org/users/tom_wyant/">
        <![CDATA[<p>I have very gradually been adopting <a href="https://metacpan.org/pod/Test2::V0">Test2::V0</a> as a testing tool. I had a test file that performed a group of tests inside a <code>for</code> loop, and discovered there were circumstances where I wanted to skip an iteration. Well, the <code>skip()</code> provided by <a href="https://metacpan.org/pod/Test2::Tools::Basic">Test2::Tools::Basic</a> operates by executing <code>last SKIP;</code>. In the case of a labeled <code>for</code> this skips not only the current iteration but all subsequent iterations.</p>

<p>I wondered if there was a <code>Test2::Tools</code> plugin that did a <code>next SKIP;</code>, so I generated an <a href="https://trwyant.github.io/misc/all-perl-test2-tools.html">annotated index of Test2 tools</a>. This index reports all of them in ASCIIbetical order, with the distribution they are found in and the abstract from the <code>=head1 NAME</code> section of the POD.</p>

<p>I found 44 tools after eliminating a few helper classes that lived in the same name space. None of the 44 appears to do what I want. It would be easy enough to create such a tool, but I doubted that anyone would use it but me. So I indented another level and stuck a <code>SKIP:</code> block inside the <code>for</code> loop.</p>

<p>Like the previous <a href="https://trwyant.github.io/misc/all-perl-critic-policies.html">Annotated Perl::Critic Policy Index</a> this will be updated approximately weekly. That is, a cron job runs Friday morning, and I push the repository when I get around to it, after reviewing the change and coming up with (I hope) a descriptive commit message.</p>
]]>
        
    </content>
</entry>

<entry>
    <title>My Favorite Modules: PerlIO::via</title>
    <link rel="alternate" type="text/html" href="https://blogs.perl.org/users/tom_wyant/2023/01/my-favorite-modules-perliovia.html" />
    <id>tag:blogs.perl.org,2023:/users/tom_wyant//506.11008</id>

    <published>2023-01-25T14:55:34Z</published>
    <updated>2023-01-25T15:02:11Z</updated>

    <summary>OK, I confess: PerlIO::via is not a module that I use every day. It allows you, easily, and with minimal code, to modify an I/O stream before it gets to the reader of the stream. or after the writer has...</summary>
    <author>
        <name>Tom Wyant</name>
        
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="https://blogs.perl.org/users/tom_wyant/">
        <![CDATA[<p>OK, I confess: <a href="https://metacpan.org/pod/PerlIO::via">PerlIO::via</a> is not a module that I use every day. It allows you, easily, and with minimal code, to modify an I/O stream <strong>before</strong> it gets to the reader of the stream. or <strong>after</strong> the writer has written it. All you do is write (say) <code>My::Module</code> conforming to the parts of the <code>PerlIO::via</code> interface you need, and provide it to the second argument of <code>open()</code> or <code>binmode()</code> as <code>':via(My::Module)'</code>. How cool is that? And how cool is a language that lets you do that with a minimum of fuss, bother, and code?</p>

<p>I encountered this when trying to modify (OK, hack) the behavior of a large and complex hunk of Perl not under my control. Rummaging around in this turned up the fact that all file input went through a single module/object, which had an <code>open()</code> method. I realized if I could insert my own <a href="https://perldoc.perl.org/PerlIO.html">PerlIO</a> layer into the input stream, I would have control over what the <s>victim</s> host code saw.</p>

<p>In the true spirit of the Conan the Barbarian school of programming ("Bash it until it submits!") I wrote a <code>PerlIO::via</code> module whose <code>import()</code> method monkey-patched the <code>open()</code> to insert my layer into the stack. All I had to do was launch the host code with <code>-MMy::Module</code> and the dirty deed was done.</p>

<p>If you read the <a href="https://metacpan.org/pod/PerlIO::via">PerlIO::via</a> documentation you see a whole host of methods you can provide. All I wanted to do was modify the input stream, and that can be done by implementing just two or three:</p>

<p>You will have to provide <code>PUSHED()</code>, which is called when your layer is pushed onto the I/O stack. That is, when someone specifies it in the second argument of <code>open()</code> or <code>binmode()</code>. This is called as a static method, and given a <a href="https://linux.die.net/man/3/fopen"><code>fopen()</code></a>-style mode string (i.e. <code>'r'</code>, <code>'w'</code>, or what have you) and the already-opened handle, which represents the layer below. This method needs to instantiate and return an object of the given class. Depending on your needs, this can be as simple as</p>

<pre>
sub PUSHED {
    my ( $class ) = @_;
    return bless {}, $class;
}
</pre>

<p>You have a couple options for how to get the input, but I opted for <code>FILL()</code>. This is called as a method, and passed a file handle which is open to the next layer down in the PerlIO stack. This would look something like:</p>

<pre>
sub FILL {
    my ( $self, $fh ) = @_;
    defined( my $data = &lt;$fh&gt; )
        or return;

<p>    # Do your worst to the $data</p>

<p>    return $data;<br />
}<br />
</pre></p>

<p>A few paragraphs back I said "two or three" methods. For a while I was content with the above two. But then I realized that the caller was getting back bytes even if the file was opened with <code>:encoding(...)</code> specified in a lower layer, and the <code>FILL()</code> method preserved the character-nature of the data. Wrestling with this finally drove me back to the documentation, where I found the <code>UTF8()</code> method.</p>

<p>The <code>UTF8()</code> method is optional, and is called (if it exists) right after <code>PUSHED()</code>. It receives one argument, which is interpreted as a Boolean, and is true if the next-lower layer provides characters rather than bytes. The returned value tells <code>PerlIO</code> whether your layer provides characters (if true) or bytes (if false). A minimal-but-sufficient implementation is</p>

<pre>
sub UTF8 {
    my ( undef, $below_flag ) = @_;
    return $below_flag;
}
</pre>

<p><strong>Caveat:</strong> If you apply the encoding and your layer in the same operation (e.g. <code>binmode $fh, ':encoding(utf-8):via(My::Module)';</code>, the <code>UTF8()</code> method will <strong>not</strong> see a true value of <code>$below_flag</code>. There are two ways of dealing with this:</p>

<ul>
    <li>Apply your <code>PerlIO::via</code> layer in a separate call to <code>binmode()</code>, or</li>
    <li>Specify an explicit <code>:utf8</code> after your layer (that is, <code>binmode $fh, ':encoding(utf-8):via(My::Module):utf8';</code>).</li>
</ul>

<p>This is already a longer note than I like, but I have to say something about <code>:utf8</code>. The current documentation calls it a pseudo-layer. What it really is is a bit on the layer below, telling <code>PerlIO</code> that the layer it applies to provides characters rather than bytes on input, or accepts characters on output. Around Perl 5.8 or 5.10 there was a fair amount of misunderstanding about what <code>:utf8</code> did, and there was actually core Perl documentation that said (or seemed to say) that you did UTF-8 I/O by specifying this layer. Most such instances of <code>:utf8</code> in the core documentation have been replaced by <code>:encoding(utf-8)</code> but there may still be some <code>:utf8</code> in outlying regions of the documentation.</p>

<p>By using <code>:utf8</code> in the second example above, what I am telling Perl is that <code>:via(My::Module)</code> produces decoded output. It does, because the layer below it (<code>:encode(utf-8)</code>) does, and <code>:via(My::Module)</code> preserves this property. Without the <code>:encode(utf-8)</code> below it it would be an error to tell PerlIO that <code>:via(My::Module)</code> produced characters unless <code>My::Module</code> did the decoding itself.</p>

<p>If you want to see what layers are in effect on file handle <code>$fh</code>, you can call <code>PerlIO::get_layers( $fh )</code>. This returns a list, which will include <code>:utf8</code> as a separate entry, maybe more than once if more than one layer has that bit set.</p>

<p>Previous entries in this series:</p>

<ol>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2021/09/my-favorite-modules-if.html"><code>if</code></a></li>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2021/10/my-favorite-modules-diagnostics-one.html"><code>diagnostics</code></a></li>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2021/11/my-favorite-modules-termreadlineperl.html"><code>Term::ReadLine::Perl</code></a></li>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2022/02/my-favorite-modules-re.html"><code>re</code></a></li>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2022/02/my-favorite-modules-develnytprof.html"><code>Devel::NYTProf</code></a></li>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2022/05/my-favorite-modules-errno.html"><code>Errno</code></a></li>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2022/05/my-favorite-modules-timepiece.html"><code>Time::Piece</code></a></li>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2022/06/core-modules-filetest.html"><code>filetest</code></a></li>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2022/06/my-favorite-modules-filestat.html"><code>File::stat</code></a></li>
</ol>
]]>
        
    </content>
</entry>

<entry>
    <title>Regexp Delimiters</title>
    <link rel="alternate" type="text/html" href="https://blogs.perl.org/users/tom_wyant/2023/01/regexp-delimiters.html" />
    <id>tag:blogs.perl.org,2023:/users/tom_wyant//506.11001</id>

    <published>2023-01-17T01:36:01Z</published>
    <updated>2023-01-17T01:38:22Z</updated>

    <summary>Perl lets you use almost anything as a regular expression delimiter. It is usual to use punctuation of some sort, but characters that match /\w/ can be used provided there is white space between the operator and the delimiter: m...</summary>
    <author>
        <name>Tom Wyant</name>
        
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="https://blogs.perl.org/users/tom_wyant/">
        <![CDATA[<p>Perl lets you use almost anything as a regular expression delimiter. It is usual to use punctuation of some sort, but characters that match <code>/\w/</code> can be used provided there is white space between the operator and the delimiter: <code>m X foo Xsmx</code> compiles and matches <code>'foobar'</code>. In the presence of <code>use utf8;</code> you can go wild.</p>

<p>A query on the Perl 5 Porters Mailing List (a.k.a. 'p5p') a few days ago asked for opinions about <a href="https://www.nntp.perl.org/group/perl.perl5.porters/2023/01/msg265450.html">appropriating the colon (<code>':'</code>) as a delimiter for modifiers to the regular expression operators</a>. This got me wondering about what regular expression delimiters were actually in use.</p>

<p>I scratched that itch by plowing through my local <a href="https://metacpan.org/dist/CPAN-Mini">Mini CPAN</a>, running everything that looked like Perl through <a href="https://metacpan.org/dist/PPI">PPI</a>, and checking anything that parsed to an object of one of the relevant classes. A summary of the results is appended.</p>

<p>It was no surprise that <code>"/"</code> was the overwhelming favorite. The colon (<code>":"</code>) came in 13th. I was a little surprised (after I thought about it) not to see <code>"'"</code> (7th) more popular, since it does not interpolate. After all, why write <code>m/[\@\$]/</code> when you can write <code>m'[@$]'</code>?</p>

<p>You made it to the end of this post. Your prize (if you want to call it that) is the threatened list of regular expression delimiters, in decreasing order of frequency. The delimiters themselves were formatted by running them through <code>B::perlstring()</code>. I suspect most of the single-digit ones are the result of mis-parses, but believe it or not, some of the instances of <code>"\\"</code> are real regular expression delimiters.</p>

<pre>
"/"       1420735
"{"       128788
"!"       36081
"|"       23932
"#"       14893
"("       7369
"'"       5180
"["       4220
","       3376
"&lt;"       2926
"%"       2308
"\@"      1302
":"       1232
"\""      828
"."       349
"~"       313
"-"       249
";"       194
"?"       182
"="       109
"^"       59
"0"       43
"`"       35
"+"       29
")"       18
"&amp;"       17
"o"       15
"n"       14
"]"       14
"r"       13
"*"       11
"\\"      11
"\036"    8
"i"       6
"\$"      6
"\a"      6
""        5
"e"       4
"&gt;"       4
"1"       4
"8"       3
"S"       3
"6"       3
"9"       3
"_"       2
"f"       2
"a"       2
"}"       2
"g"       2
"m"       2
"5"       2
"v"       1
"q"       1
"l"       1
"I"       1
"d"       1
"M"       1
"c"       1
"s"       1
"t"       1
"H"       1
"\247"    1
"u"       1
"x"       1
</pre>
]]>
        
    </content>
</entry>

<entry>
    <title>Making GitHub CI work with Perl 5.8.</title>
    <link rel="alternate" type="text/html" href="https://blogs.perl.org/users/tom_wyant/2022/12/making-github-ci-work-with-perl-58.html" />
    <id>tag:blogs.perl.org,2022:/users/tom_wyant//506.10960</id>

    <published>2022-12-01T13:48:01Z</published>
    <updated>2022-12-01T13:52:07Z</updated>

    <summary>A while back. I got a pull request from Gabor Szabo adding a GitHub action to one of my distributions. I have been working with this, but have not (so far) blogged about it because, quite frankly, I am still...</summary>
    <author>
        <name>Tom Wyant</name>
        
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="https://blogs.perl.org/users/tom_wyant/">
        <![CDATA[<p>A while back. I got a pull request from Gabor Szabo adding a GitHub action to one of my distributions. I have been working with this, but have not (so far) blogged about it because, quite frankly, I am still not sure I know what I am doing.</p>

<p>One of my personal desires was to test my distributions on the oldest practicable Perl for each available architecture. For Unix (i.e. Linux and macOS) this is 5.8.8, provided the distribution itself supports that. A couple days ago, though, I pushed a modification to one of my distributions and had the 5.8.8 tests blow up.</p>

<p>The problem turned out to be that <a href="https://metacpan.org/pod/Module::Build">Module::Build</a>, for reasons I have not investigated, has <a href="https://metacpan.org/pod/Pod::Man">Pod::Man</a> as a dependency. The current version of <code>Module::Build</code> requires <code>Pod::Man</code> version 2.17, but according to <a href="https://perldoc.perl.org/corelist">corelist</a> Perl 5.8.8 comes with <code>Pod::Man</code> version 1.37, so <code>cpanm</code> wants to upgrade it.</p>

<p>The problem with this is that as of version 5.0 released November 25 2022, the <a href="https://metacpan.org/dist/podlators">podlators</a> distribution, which supplies <code>Pod::Man</code>, requires Perl 5.10. So under 5.8.8, <code>cpanm --with-configure --notest --installdeps .</code> dies trying to install <code>podlators</code>.</p>

<p>The solution I came up with was to pre-emptively install <code>RRA/podlators-4.14.tar.gz</code> under Perl 5.8.8. The implementation was in two parts: define an environment variable that recorded whether we were running under Perl 5.10, and define a job step conditioned on that variable to install <code>podlators 4.14</code> if we were using an earlier Perl.</p>

<p>Under GitHub Actions you can define environment variables by appending their definitions to the file whose path is in environment variable <code>GITHUB_ENV</code>. After struggling with PowerShell for the Windows runners, I decided to do that step in Perl. The core of the Perl script is:</p>

<pre>
defined $ENV{GITHUB_ENV}
    and $ENV{GITHUB_ENV} ne ''
    or die "Environment variable GITHUB_ENV undefined or empty\n";
open my $fh, '&gt;&gt;:encoding(utf-8)', $ENV{GITHUB_ENV}
    or die "Can not open $ENV{GITHUB_ENV}: $!\n";

<p>my $home = File::HomeDir-&gt;my_home();<br />
my $is_5_10 = "$]" &gt;= 5.010 ? 1 : '';<br />
my $is_windows = {<br />
    MSWin32	=&gt; 1,<br />
    dos		=&gt; 1,<br />
}-&gt;{$^O} || '';<br />
my $is_unix = $is_windows ? '' : 1;</p>

<p>print $fh &lt;&lt;"EOD";<br />
MY_HOME=$home<br />
MY_IS_UNIX=$is_unix<br />
MY_IS_WINDOWS=$is_windows<br />
MY_PERL_IS_5_10=$is_5_10<br />
EOD<br />
</pre></p>

<p>Next I had to run this from the YAML file that defined the workflow, and act on the created value. This was done using two steps:</p>

<pre>
    - name: Customize environment
      run: |
        cpanm -v
        cpanm File::HomeDir
        perl .github/workflows/environment.PL
</pre>

<p>and</p>

<pre>
    - name: Install old podlators distro if on old Perl
      if: "! env.MY_PERL_IS_5_10"
      run: cpanm RRA/podlators-4.14.tar.gz
</pre>

<p>The entirety of both the GitHub Actions file <code>ci.yml</code> and the Perl script <code>environment.PL</code> can be found in <a href="https://github.com/trwyant/perl-Astro-Coord-ECI/tree/master/.github/workflows">the GitHub repository for Astro::Coord::ECI</a>. Other, and probably better, implementations can be imagined.</p>]]>
        
    </content>
</entry>

<entry>
    <title>Match Anything, Quickly -- Revision 1</title>
    <link rel="alternate" type="text/html" href="https://blogs.perl.org/users/tom_wyant/2022/09/match-anything-quickly----revision-1.html" />
    <id>tag:blogs.perl.org,2022:/users/tom_wyant//506.10881</id>

    <published>2022-09-02T18:17:31Z</published>
    <updated>2022-11-01T05:07:33Z</updated>

    <summary>O wad some Power the giftie gie us To see oursels as ithers see us! It wad frae mony a blunder free us, An&apos; foolish notion: ... My previous blog post, Match Anything, Quickly, brought a number of responses which...</summary>
    <author>
        <name>Tom Wyant</name>
        
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="https://blogs.perl.org/users/tom_wyant/">
        <![CDATA[<p><cite>O wad some Power the giftie gie us<br />
    To see oursels as ithers see us!<br />
    It wad frae mony a blunder free us,<br />
    An' foolish notion: ...</cite></p>

<p>My previous blog post, <a href="/users/tom_wyant/2022/08/match-anything-quickly.html">Match Anything, Quickly</a>, brought a number of responses which are worth reading in their own right. The one that triggered this post, though, was from <a href="https://nrdvana.net/">Nerdvana</a> and Devin of Cincinnati Perl Mongers, who pointed out an error in my benchmark script. I had left off the intended <code>/smx</code> from the <code>qr/ ... /</code> version of the test, which meant that the regular expression did not in fact match.</p>

<p><strong>Three cheers for code reviews!</strong></p>

<p>The Cincinnati Perl Mongers came up with a further case which combines my two:</p>

<pre>
eval "do { my \$regex = qr/ $re /smx; " .
        "sub { \$MATCH =~ /\$regex/o }};"
</pre>

<p>They benchmarked this as being slightly slower than the case where the regular expression is simply interpolated into the subroutine verbatim.</p>

<p>Interestingly (to me, at least) they reported that the removal of the <code>/o</code> modifier made their case 2-3 times slower. This surprised me somewhat, as I had understood that modern Perls (for some value of "modern") had done things to minimize the performance difference between the presence and absence of <code>/o</code>.</p>

<p>For the record, the corrected script is also on <a href="https://trwyant.github.io/blog/2022-09-02/match-anything-quickly-rev01.PL">GitHub</a>. The corrections include an option that tests to make sure all benchmarked things actually match. The result of running this with the <code>--test</code> and <code>--html</code> options (on a different machine than the original post) is:</p>

<pre>
ok 1 - sub { 1 }
ok 2 - sub { $MATCH =~ m/ (*ACCEPT) /smx }
ok 3 - qr/ (*ACCEPT) /smx
ok 4 - sub { $MATCH =~ m/ (?) /smx }
ok 5 - qr/ (?) /smx
ok 6 - sub { $MATCH =~ m/ (?:) /smx }
ok 7 - qr/ (?:) /smx
ok 8 - sub { $MATCH =~ m/ .? /smx }
ok 9 - qr/ .? /smx
ok 10 - sub { $MATCH =~ m/ .{0} /smx }
ok 11 - qr/ .{0} /smx
ok 12 - sub { $MATCH =~ m/ \A /smx }
ok 13 - qr/ \A /smx
ok 14 - sub { $MATCH =~ m/ ^ /smx }
ok 15 - qr/ ^ /smx
1..15
</pre>

<table>
<thead>
<tr><th>Implementation</th><th>Rate</th></tr>
</thead>
<tbody>
<tr><td>sub { 1 }</td><td style="text-align: right;">434782608.70/sec</td></tr>
<tr><td>sub { $MATCH =~ m/ \A /smx }</td><td style="text-align: right;">13333333.33/sec</td></tr>
<tr><td>sub { $MATCH =~ m/ ^ /smx }</td><td style="text-align: right;">13315579.23/sec</td></tr>
<tr><td>sub { $MATCH =~ m/ (?:) /smx }</td><td style="text-align: right;">12315270.94/sec</td></tr>
<tr><td>sub { $MATCH =~ m/ (?) /smx }</td><td style="text-align: right;">11173184.36/sec</td></tr>
<tr><td>sub { $MATCH =~ m/ .{0} /smx }</td><td style="text-align: right;">10593220.34/sec</td></tr>
<tr><td>sub { $MATCH =~ m/ .? /smx }</td><td style="text-align: right;">10449320.79/sec</td></tr>
<tr><td>sub { $MATCH =~ m/ (*ACCEPT) /smx }</td><td style="text-align: right;">4380201.49/sec</td></tr>
<tr><td>qr/ ^ /smx</td><td style="text-align: right;">2612330.20/sec</td></tr>
<tr><td>qr/ \A /smx</td><td style="text-align: right;">2603488.67/sec</td></tr>
<tr><td>qr/ (?:) /smx</td><td style="text-align: right;">2586652.87/sec</td></tr>
<tr><td>qr/ (?) /smx</td><td style="text-align: right;">2575991.76/sec</td></tr>
<tr><td>qr/ .{0} /smx</td><td style="text-align: right;">2518891.69/sec</td></tr>
<tr><td>qr/ .? /smx</td><td style="text-align: right;">2510670.35/sec</td></tr>
<tr><td>qr/ (*ACCEPT) /smx</td><td style="text-align: right;">1849796.52/sec</td></tr>
</tbody>
</table>
]]>
        
    </content>
</entry>

<entry>
    <title>Match Anything, Quickly</title>
    <link rel="alternate" type="text/html" href="https://blogs.perl.org/users/tom_wyant/2022/08/match-anything-quickly.html" />
    <id>tag:blogs.perl.org,2022:/users/tom_wyant//506.10871</id>

    <published>2022-08-06T00:39:25Z</published>
    <updated>2022-11-01T05:07:27Z</updated>

    <summary>Revision: that Cincinnati Perl Mongers found an error in the benchmark script used for this post. Match Anything Quickly - Revision 1 discusses their findings and links to a revised benchmark script. -- TRW 2022-09-02 Sometimes I want to filter...</summary>
    <author>
        <name>Tom Wyant</name>
        
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="https://blogs.perl.org/users/tom_wyant/">
        <![CDATA[<p><strong>Revision:</strong> that Cincinnati Perl Mongers found an error in the benchmark script used for this post. <a href="/users/tom_wyant/2022/09/match-anything-quickly----revision-1.html">Match Anything Quickly - Revision 1</a> discusses their findings and links to a revised benchmark script. -- TRW 2022-09-02</p>

<p>Sometimes I want to filter a set of strings, but the details of the filter are not known beforehand. In particular, I may want a null filter, which simply accepts anything.</p>

<p>This looks like a job for a regular expression, but I can think of at least two implementations. One is to pass around regular expression objects. The second is to wrap a match (<code>m//</code>) in a subroutine reference, and pass that around. Given the use of regular expressions, there are a number of possibilities for a regular expression that matches any string.</p>

<p>I wondered whether one of the alternatives I was choosing among was faster than another, so I decided to <a href="https://perldoc.perl.org/Benchmark.html"><code>Benchmark</code></a> them. Both implementations applied the regular expression to a global variable. In practice this would probably be a localized <code>$_</code>, but my read of the <a href="https://perldoc.perl.org/Benchmark.html"><code>Benchmark</code></a> module says that it also localizes <code>$_</code>, but leaves it <code>undef</code>.</p>

<p><strong>Note</strong> that the empty pattern is not benchmarked, because it is equivalent to the last successfully-matched pattern, if any. The <code>sub { 1 }</code> was included because if we're dealing in code references, the null filter simply needs to return a true value.</p>

<p>Here are the results, obtained with Perl 5.36.0, unthreaded. The script that generated them is on <a href="https://trwyant.github.io/blog/2022-08-05/match-anything-quickly.PL">GitHub</a></p>

<table>
<thead>
<tr><th>Implementation</th><th>Rate</th></tr>
</thead>
<tbody>
<tr><td>sub { 1 }</td><td style="text-align: right;">294117647.06/sec</td></tr>
<tr><td>sub { m/ .? /smx }</td><td style="text-align: right;">21645021.65/sec</td></tr>
<tr><td>sub { m/ .{0} /smx }</td><td style="text-align: right;">21598272.14/sec</td></tr>
<tr><td>sub { m/ (*ACCEPT) /smx }</td><td style="text-align: right;">20964360.59/sec</td></tr>
<tr><td>sub { m/ (?) /smx }</td><td style="text-align: right;">20876826.72/sec</td></tr>
<tr><td>sub { m/ \A /smx }</td><td style="text-align: right;">20746887.97/sec</td></tr>
<tr><td>sub { m/ (?:) /smx }</td><td style="text-align: right;">20618556.70/sec</td></tr>
<tr><td>sub { m/ ^ /smx }</td><td style="text-align: right;">20618556.70/sec</td></tr>
<tr><td>qr/ (?) /smx</td><td style="text-align: right;">2344665.89/sec</td></tr>
<tr><td>qr/ (?:) /smx</td><td style="text-align: right;">2344116.27/sec</td></tr>
<tr><td>qr/ ^ /smx</td><td style="text-align: right;">2336448.60/sec</td></tr>
<tr><td>qr/ \A /smx</td><td style="text-align: right;">2315350.78/sec</td></tr>
<tr><td>qr/ .? /smx</td><td style="text-align: right;">2208968.41/sec</td></tr>
<tr><td>qr/ .{0} /smx</td><td style="text-align: right;">2180074.12/sec</td></tr>
<tr><td>qr/ (*ACCEPT) /smx</td><td style="text-align: right;">1717327.84/sec</td></tr>
</tbody>
</table>

<p>Somewhat to my surprise, the subroutine-reference implementation was an
order of magnitude faster than the regular-expression-reference implementation.
I expected that, <code>Regexp</code>s being first-class objects, it would be
pretty much equivalent to <code>m/ ... /</code> wrapped in a subroutine --
maybe even a little faster.</p>

<p>A little messing around with <code>perl -MO=Concise</code> got me the following:</p>

<pre>
$ perl -MO=Concise -e '$_ =~ m/foo/;'
5  &lt;@&gt; leave[1 ref] vKP/REFC -&gt;(end)
1     &lt;0&gt; enter v -&gt;2
2     &lt;;&gt; nextstate(main 1 -e:1) v:{ -&gt;3
4     &lt;/&gt; match(/"foo"/) vKS -&gt;5
-        &lt;1&gt; ex-rv2sv sK/1 -&gt;4
3           &lt;$&gt; gvsv(*_) s -&gt;4
-e syntax OK
$ perl -MO=Concise -e '$_ =~ qr/foo/;'
7  &lt;@&gt; leave[1 ref] vKP/REFC -&gt;(end)
1     &lt;0&gt; enter v -&gt;2
2     &lt;;&gt; nextstate(main 1 -e:1) v:{ -&gt;3
6     &lt;/&gt; match() vKS -&gt;7
-        &lt;1&gt; ex-rv2sv sK/1 -&gt;4
3           &lt;$&gt; gvsv(*_) s -&gt;4
5        &lt;|&gt; regcomp(other-&gt;6) sK -&gt;6
4           &lt;/&gt; qr(/"foo"/) s -&gt;5
-e syntax OK
</pre>

<p>The salient difference, to my eye, was the presence of the <code>regcomp</code> operator in the second case. <a href="https://metacpan.org/pod/Perldoc::Search"><code>perldoc-search</code></a> on this led me eventually to <a href="https://perldoc.perl.org/perlreapi.html"><code>perlreapi</code></a> which says, in part,</p>

<p><br />
<dl><br />
    <dt>"precomp" "prelen"</dt><br />
    <dd><br />
	<p>Used for optimisations. "precomp" holds a copy of the pattern that was compiled and "prelen" its length. When a new pattern is to be compiled (such as inside a loop) the internal "regcomp" operator checks if the last compiled "REGEXP"'s "precomp" and "prelen" are equivalent to the new one, and if so uses the old pattern instead of compiling a new one.</p></p>

<p>	<p>The relevant snippet from "Perl_pp_regcomp":</p><br />
	<pre><br />
            if (!re || !re-&gt;precomp || re-&gt;prelen != (I32)len ||<br />
                memNE(re-&gt;precomp, t, len))<br />
            /* Compile a new pattern */<br />
	</pre><br />
    </dd><br />
</dl></p>

<p>So I <strong>assume</strong> that the speed difference <strong>might</strong> be reduced if the filter was called in a tight enough loop. But if so, the <a href="https://perldoc.perl.org/Benchmark.html"><code>Benchmark</code></a> loop is not tight enough, and it's pretty tight. On the other hand, maybe the <code>Benchmark</code> loop <strong>is</strong> tight enough, and the extra time is spent determining that a recompilation is not needed. But it will take deeper knowledge of Perl internals than I possess to sort this out.</p>
]]>
        
    </content>
</entry>

<entry>
    <title>Numeric Variable Names With Leading Zeroes</title>
    <link rel="alternate" type="text/html" href="https://blogs.perl.org/users/tom_wyant/2022/07/numeric-variable-names-with-leading-zeroes.html" />
    <id>tag:blogs.perl.org,2022:/users/tom_wyant//506.10860</id>

    <published>2022-07-26T16:52:03Z</published>
    <updated>2022-07-26T16:54:10Z</updated>

    <summary>Over on the p5p mailing list, a user raised the issue that use of variable $00 is an error starting with Perl 5.32, and asked that this &quot;regression&quot; be fixed. I have always understood that variables whose names begin with...</summary>
    <author>
        <name>Tom Wyant</name>
        
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="https://blogs.perl.org/users/tom_wyant/">
        <![CDATA[<p>Over on the p5p mailing list, a user raised the issue that <a href="https://www.nntp.perl.org/group/perl.perl5.porters/2022/07/msg264450.html">use of variable <code>$00</code> is an error starting with Perl 5.32</a>, and asked that this "regression" be fixed.</p>

<p>I have always understood that variables whose names begin with anything but an alphabetic or an underscore are reserved to Perl, and you mess with them at your peril. And this is the gist of the Porters&apos; response to the post. Recent versions of <a href="https://perldoc.perl.org/perlvar.html"><code>perlvar</code></a> say this explicitly, though earlier versions of that document restrict themselves to describing currently-implemented special variables.</p>

<p>For what it's worth, <a href="https://perldoc.perl.org/perl5320delta"><code>perl532delta</code></a> appears <strong>not</strong> to mention this as a new diagnostic.</p>

<p>I wondered how much of this kind of thing was in CPAN, so I whipped up a <code>Perl::Critic</code> policy to try to find them: <a href="https://github.com/trwyant/perl-Perl-Critic-Policy-Variables-ProhibitNumericNamesWithLeadingZero"><code>Variables::ProhibitNumericNamesWithLeadingZero</code></a>. I then ran this against CPAN as it stood July 23 2022.</p>

<p>The only violation of this policy that I found was in line 1209 of <a href="https://metacpan.org/pod/Net::Elexol::EtherIO24"><code>Net::Elexol::EtherIO24</code></a>. The most recent release of this module (as of this writing) is August 11 2009. The line in violation (in context) is</p>

<pre>
1208    $txt .= sprintf("MAC: %02.2x:%02.2x:02.2x:02.2x:02.2x:02.2x  ".
1209                    "Fw: %02.2x.$02.2x",
1210                    unpack("x$len CCCCCCCC", $cmd));
</pre>

<p>and looks to me very much like a typo for <code>%02.2x</code>. The distribution requires a threaded Perl, and CPAN Testers show failures with <code>Error:  Numeric variables with more than one digit may not start with '0' at Net-Elexol-EtherIO24-0.22-0/blib/lib/Net/Elexol/EtherIO24.pm line 1209.</code> for Perl versions 5.32.1 and above. There are no reports for 5.32.0.</p>

<p>Under the circumstances I can not imagine anyone (other than maybe the original poster on p5p) actually wanting this perlcritic policy published, but I did stick it on GitHub for the curious.</p>
]]>
        
    </content>
</entry>

<entry>
    <title>Sorting Subroutine Results</title>
    <link rel="alternate" type="text/html" href="https://blogs.perl.org/users/tom_wyant/2022/07/sorting-subroutine-results.html" />
    <id>tag:blogs.perl.org,2022:/users/tom_wyant//506.10855</id>

    <published>2022-07-20T14:26:27Z</published>
    <updated>2022-07-20T14:29:13Z</updated>

    <summary>The Perl sort built-in is mostly (at least by me) called as sort LIST or sort BLOCK LIST. But there is a third way to call it: sort SUBROUTINE LIST, which actually appears first in the documentation. This is not...</summary>
    <author>
        <name>Tom Wyant</name>
        
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="https://blogs.perl.org/users/tom_wyant/">
        <![CDATA[<p>The Perl <a href="https://perldoc.perl.org/functions/sort"><code>sort</code></a> built-in is mostly (at least by me) called as <code>sort LIST</code> or <code>sort BLOCK LIST</code>. But there is a third way to call it: <code>sort SUBROUTINE LIST</code>, which actually appears first in the documentation.</p>

<p>This is not a blog entry about using the <code>sort SUBROUTINE LIST</code> form of <code>sort</code>. It is more about the need to be aware of this form when writing (or trying to write) the <code>sort LIST</code> form.</p>

<p>Consider the following situation: you have a subroutine <code>foo()</code> which returns an un-ordered list. You need that list sorted. Perl has a sort built-in, so your (or at least my) first reaction is to write <code>my @sorted = sort foo();</code>, run it, and then wonder why <code>@sorted</code> is empty.</p>
 
<p>The problem, of course, is that Perl parses this as <code>sort SUBROUTINE LIST</code> with the SUBROUTINE being <code>foo</code> and the LIST being everything after <code>foo</code>. The contents of the parentheses (if any) are not passed as arguments to <code>foo()</code>, but are consumed by the <code>sort</code>. Subroutine <code>foo()</code> gets called only to order pairs of items in the LIST.</p>

<p>If you actually want to sort the list returned by <code>foo()</code>, you have to persuade Perl not to parse <code>sort foo()</code> as <code>sort SUBROUTINE LIST</code>. The documentation contains the words <cite>Warning: syntactical care is required when sorting the list returned from a function</cite>, and provides ways to make this happen. They are, basically,</p>

<ul>
    <li>Provide a sort block, e.g. <code>sort { $a cmp $b } foo()</code></li>
    <li>Use a unary plus, e.g. <code>sort +foo()</code></li>
    <li>Call the function with an ampersand, e.g. <code>sort &amp;foo()</code></li>
    <li>Call <code>sort</code> as a function, e.g. <code>sort( foo() )</code></li>
</ul>

<p>Which of these you choose is largely a matter of style. I believe a sort block imposes a performance penalty, but whether this is significant depends on the application.</p>
]]>
        
    </content>
</entry>

<entry>
    <title>Scalar Context: Lists Versus Arrays</title>
    <link rel="alternate" type="text/html" href="https://blogs.perl.org/users/tom_wyant/2022/07/scalar-context-lists-versus-arrays.html" />
    <id>tag:blogs.perl.org,2022:/users/tom_wyant//506.10848</id>

    <published>2022-07-12T18:54:31Z</published>
    <updated>2022-07-12T18:57:03Z</updated>

    <summary>For a long time after I first encountered Perl, I looked on &quot;list&quot; and &quot;array&quot; as essentially interchangeable concepts. A list was simply the source construct corresponding to an array. This idea is mostly correct. But as they say, the...</summary>
    <author>
        <name>Tom Wyant</name>
        
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="https://blogs.perl.org/users/tom_wyant/">
        <![CDATA[<p>For a long time after I first encountered Perl, I looked on "list" and "array" as essentially interchangeable concepts. A list was simply the source construct corresponding to an array. This idea is <strong>mostly</strong> correct. But as they say, the devil is in the details.</p>

<p>One of the differences is what happens to them in scalar context. An array evaluates to the number of elements it contains. A list evaluates to its last element. So:</p>

<pre>
my @array = qw{ one two five };
say scalar @array;  # prints '3'
{
    no warnings 'void'; # Note the need for this
    say scalar( qw{ one two five } ); # prints 'five'
}
</pre>

<p>Okay, that is a trivial example. It becomes more interesting when you consider that subroutines inherit their calling context. If called in scalar context, a subroutine that returns a list behaves differently than one that returns an array:</p>

<pre>
sub array {
    state $array = [ qw{ one two five } ];
    return @{ $array };
}
sub list {
    return qw{ one two five };
}
say scalar array(); # prints 3
say scalar list();  # prints 'five';
</pre>

<p>Now, there is some sentiment against subroutines that "behave differently" in scalar and list context. Usually this is thought of in terms of the <a href="https://perldoc.perl.org/functions/wantarray"><code>wantarray()</code></a> built-in, and there is actually Perl Critic policy <a href="https://metacpan.org/pod/Perl::Critic::Policy::Community::Wantarray"><code>Perl::Critic::Community::WantArray</code></a> to flag these.</p>

<p>But it seems to me that any Perl subroutine that returns more than one value <strong>will</strong> behave differently in scalar context: it's just a question of whether you want the array behavior, the list behavior, or the arbitrary behavior you can get with <code>wantarray()</code>. The difference between good code and bad code is a matter of choosing this behavior carefully.</p>

<p>P.S.</p>

<p>What do you do if you have an array but want list behavior? There is no <code>list</code> built-in corresponding to the <code>scalar</code> built-in. The <a href="https://perldoc.perl.org/functions/scalar">documentation for <code>scalar</code></a> talks about this, but only addresses interpolation. In the general case, though, what seems to work is slicing the entire array:</p>

<pre>
say scalar @array[ 0 .. $#array ]; # prints 'five'
</pre>

<p>Or, if you want to encapsulate this behavior,</p>

<pre>
sub make_list { return @_[0..$#_] }
say scalar make_list( qw{ one two five } ); # prints 'five';
</pre>

<p>No, I did not come up with this on my own. I got it from <a href="https://stackoverflow.com/questions/34685788/convert-array-to-list">Stack Overflow</a>, specifically from <code>user2404501</code>'s response.</p>

<p>Be careful of getting too fancy with this. <code>scalar @array[ 0 .. $#array ]</code> is written <strong>much</strong> more clearly as <code>$array[-1]</code>.</p>
]]>
        
    </content>
</entry>

<entry>
    <title>Announcing perlcritic Policy ValuesAndExpressions::ProhibitFiletest_rwxRWX</title>
    <link rel="alternate" type="text/html" href="https://blogs.perl.org/users/tom_wyant/2022/07/announcing-perlcritic-policy-valuesandexpressionsprohibitfiletest-rwxrwx.html" />
    <id>tag:blogs.perl.org,2022:/users/tom_wyant//506.10845</id>

    <published>2022-07-05T16:21:43Z</published>
    <updated>2022-07-05T16:22:52Z</updated>

    <summary>Since several places in the Perl documentation caution against the use of the file access operators (-r and friends), and since I was unable to find a Perl::Critic policy dealing with this, I thought I would make one: Perl::Critic::Policy::ValuesAndExpressions::ProhibitFiletest_rwxRWX. This...</summary>
    <author>
        <name>Tom Wyant</name>
        
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="https://blogs.perl.org/users/tom_wyant/">
        <![CDATA[<p>Since several places in the Perl documentation caution against the use of the file access operators (<code>-r</code> and friends), and since I was unable to find a <a href="https://metacpan.org/pod/Perl::Critic"><code>Perl::Critic</code></a> policy dealing with this, I thought I would make one: <a href="https://metacpan.org/pod/Perl::Critic::Policy::ValuesAndExpressions::ProhibitFiletest_rwxRWX"><code>Perl::Critic::Policy::ValuesAndExpressions::ProhibitFiletest_rwxRWX</code></a>.</p>

<p>This policy is assigned to the <code>'bugs'</code> theme. It has low severity because there are some uses of these operators that seem legitimate to me -- or at least I see no easy way to get around their use.</p>

<p>On the one hand, something like</p>

<pre>
-r $file or die "File $file not readable\n";
open my $handle, '&lt;', $file;
</pre>

<p>is wrong several ways. On the other hand, it is hard to see how to implement <a href="https://metacpan.org/pod/File::Which"><code>File::Which</code></a> without the use of <code>-x</code>. And in fact it <strong>does</strong> use <code>-x</code>.</p>

<p>This policy has no configuration options. I can imagine a configuration option to allow some file access operators, but was unsure how much actual need there is for such an option. A configuration option to allow file access operators within the scope of a <code>use filetest 'access';</code> might be possible, but would certainly make the policy much more complex.</p>

<p>Maybe this policy should be in the <code>::BuiltinFunctions::</code> name space, but I decided to follow the precedent established by Kevin Ryde in his <a href="https://metacpan.org/pod/Perl::Critic::Policy::ValuesAndExpressions::ProhibitFiletest_f"><code>Perl::Critic::Policy::ValuesAndExpressions::ProhibitFiletest_f</code></a>.</p>
]]>
        
    </content>
</entry>

<entry>
    <title>Smart Match in CPAN</title>
    <link rel="alternate" type="text/html" href="https://blogs.perl.org/users/tom_wyant/2022/06/smart-match-in-cpan.html" />
    <id>tag:blogs.perl.org,2022:/users/tom_wyant//506.10832</id>

    <published>2022-06-29T23:15:18Z</published>
    <updated>2022-06-29T23:19:50Z</updated>

    <summary>There is nothing like looking, if you want to find something. -- The Hobbit, iv, &quot;Over Hill and Under Hill&quot; Recently on the p5p mailing list the topic of removing smart match re-surfaced. There was a fairly vigorous discussion about...</summary>
    <author>
        <name>Tom Wyant</name>
        
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="https://blogs.perl.org/users/tom_wyant/">
        <![CDATA[<p><cite>There is nothing like looking, if you want to find something. -- <u>The Hobbit</u>, iv, "Over Hill and Under Hill"</cite></p>

<p>Recently on the p5p mailing list the topic of removing smart match re-surfaced. There was a fairly vigorous discussion about the effect this would have on CPAN. So I thought I would look into how many uses there actually were.</p>

<p>Fortunately there are Perl Critic policies for this: Jan Holčapek's <a href="https://metacpan.org/module/Perl::Critic::Policy::ControlStructures::ProhibitSwitchStatements">Perl::Critic::Policy::ControlStructures::ProhibitSwitchStatements</a> and <a href="https://metacpan.org/module/Perl::Critic::Policy::Operators::ProhibitSmartmatch">Perl::Critic::Policy::Operators::ProhibitSmartmatch</a>. All I had to do was run them against my mini-CPAN.</p>

<p>My results:</p>

<ul>
    <li>Total distributions: 40704</li>
    <li>Distributions with violations: 842</li>
    <li>Files with violations: 1568</li>
</ul>

<p>A look at the file names involved says that about two-thirds of the violations are in the published modules themselves, and the rest are in support code (directories <code>t/</code>, <code>inc/</code>, and the like).</p>

<p>It is possible that the results of <code>Perl::Critic::Policy::ControlStructures::ProhibitSwitchStatements</code> contain false positives simply because someone implemented subroutines named <code>given()</code> or <code>when()</code> unrelated to smart matching.</p>

<p>It is hard for me to see how there could be false positives from <code>Perl::Critic::Policy::Operators::ProhibitSmartmatch</code>, though I have learned long since that reality exceeds my ability to imagine it.</p>

<p>Given the nature of Perl, false negatives may have to be detected on a case-by-case basis. I do know that when smart match was briefly removed in a development release a few years back only one module that I use broke, and I had an alternative for it.</p>

<p>The mini-CPAN repository used for analysis was most recently updated 2022-06-24 08:10Z. The configuration file is</p>

<pre>
remote: https://www.cpan.org/
local: &lt;censored&gt;
exact_mirror: 0
skip_perl: 1
dirmode: 0755
path_filters: /Mail-DeliveryStatus-BounceParser-\d
</pre>

<p>I have unpublished modules in this repository, but they were excluded from the analysis. Also excluded were a few other modules that I have had trouble running Perl Critic against in the past:</p>

<pre>
CMORRIS/Parse-Extract-Net-MAC48-0.01.tar.gz
DOLMEN/Number-Phone-FR-0.0917215.tar.gz
GSLONDON/Parse-Nibbler-1.10.tar.gz
</pre>

<p>A list of the distributions containing violations is at <a href="https://trwyant.github.io/misc/smart-match-in-cpan/distros-with-violations.txt"><code>https://trwyant.github.io/misc/smart-match-in-cpan/distros-with-violations.txt</code></a>.</p>

<p>An ugly JSON file containing the results of the critique is at <a href="https://trwyant.github.io/misc/smart-match-in-cpan/smart-match.json"><code>https://trwyant.github.io/misc/smart-match-in-cpan/smart-match.json</code></a>. By "ugly" I mean non-pretty, non-canonical. This file encodes a hash whose top-level keys are:</p>

<ul>
    <li><code>asof</code> - The ISO time the analysis was run;</li>
    <li><code>critique</code> - A hash reference containing the results of the critique (see below);</li>
    <li><code>policy</code> - An array reference containing the fully-qualified names of the policies used to critique the code.</li>
</ul>

<p>The critique is a set of nested hashes keyed by author name, distribution name, and file name relative to the base directory of the distribution. The value for each file is a reference to an array containing the the violations for that file: line number, column number, policy violated, violation description, and violation explanation. For brevity's sake files without violations are omitted from the output.</p>
]]>
        
    </content>
</entry>

<entry>
    <title>Annotated Perl::Critic Policy Index</title>
    <link rel="alternate" type="text/html" href="https://blogs.perl.org/users/tom_wyant/2022/06/annotated-perlcritic-policy-index.html" />
    <id>tag:blogs.perl.org,2022:/users/tom_wyant//506.10830</id>

    <published>2022-06-24T18:23:49Z</published>
    <updated>2022-06-24T18:25:20Z</updated>

    <summary>In the wake of my postings on the file access tests (-r and friends) I wondered if there was a Perl::Critic policy to find them. So I constructed an annotated index of Perl Critic policies. Because of its size I...</summary>
    <author>
        <name>Tom Wyant</name>
        
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="https://blogs.perl.org/users/tom_wyant/">
        <![CDATA[<p>In the wake of my postings on the file access tests (<code>-r</code> and friends) I wondered if there was a <a href="https://metacpan.org/pod/Perl::Critic"><code>Perl::Critic</code></a> policy to find them. So I constructed an <a href="https://trwyant.github.io/misc/all-perl-critic-policies.html">annotated index of Perl Critic policies</a>. Because of its size I stuck it on GitHub rather than in-line to this blog post.</p>

<p>This index assumes that any CPAN module whose name begins with <code>Perl::Critic::Policy::</code> is a Perl Critic Policy. The index entry for each module contains the name of the module itself (linked to Meta::CPAN), the name of the distribution which contains it, and the abstract for the module if it contains anything other than a repeat of the module name. I suppose the module description could have been added, but I hoped the abstract would be sufficient.</p>

<p>This operation gave me <strong>341</strong> policies. I did not find the policy I wanted among them. In fact, only Kevin Ryde's <a href="https://metacpan.org/module/Perl::Critic::Policy::ValuesAndExpressions::ProhibitFiletest_f">Perl::Critic::Policy::ValuesAndExpressions::ProhibitFiletest_f</a> came close.</p>

<p>For those who want context, the relevant blog posts are:</p>
<ul>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2022/06/the-file-access-operators-to-use-or-not-to-use.html">The File Access Operators: To Use, or Not to Use</a></li>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2022/06/core-modules-filetest.html">Core Modules: <code>filetest</code></a></li>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2022/06/my-favorite-modules-filestat.html">My Favorite Modules: <code>File::stat</code></a></li>
</ul>
]]>
        
    </content>
</entry>

<entry>
    <title>My Favorite Modules: File::stat</title>
    <link rel="alternate" type="text/html" href="https://blogs.perl.org/users/tom_wyant/2022/06/my-favorite-modules-filestat.html" />
    <id>tag:blogs.perl.org,2022:/users/tom_wyant//506.10824</id>

    <published>2022-06-17T01:23:21Z</published>
    <updated>2022-06-17T01:25:40Z</updated>

    <summary>File::stat overrides the core stat() and lstat() functions. Instead of arrays, the new functions return an object having methods corresponding to the elements of the arrays returned by the original functions. This module has been in core since Perl 5.004....</summary>
    <author>
        <name>Tom Wyant</name>
        
    </author>
    
    
    <content type="html" xml:lang="en" xml:base="https://blogs.perl.org/users/tom_wyant/">
        <![CDATA[<p><a href="https://metacpan.org/pod/File::stat"><code>File::stat</code></a> overrides the core <code>stat()</code> and <code>lstat()</code> functions. Instead of arrays, the new functions return an object having methods corresponding to the elements of the arrays returned by the original functions. This module has been in core since Perl 5.004.</p>

<p>The advantage of this module is clearer code. For example, to get the size of file <code>$file</code> without it is something like</p>

<pre>
    my $size = ( stat $file )[7];
</pre>

<p>But with this module the same effect is given by</p>

<pre>
    my $size = stat( $file )-&gt;size();
</pre>

<p>Once you have the object in hand, you cam query it for any of its properties, so if you want both size and modification time, instead of</p>

<pre>
    my ( $size, $mtime ) = ( stat $file )[ 7, 9 ];
</pre>

<p>you can say</p>

<pre>
    my $st = stat $file;
    my $size = $st-&gt;size();
    my $mtime = $st-&gt;mtime();
</pre>

<p>Starting with <code>File::stat</code> version 1.02 (which ships with Perl 5.12) the returned object overloads the file test operators (<a href="https://perldoc.perl.org/functions/-X"><code>-X</code></a>), so that the above example could be extended by something like</p>

<pre>
    my $mine = -o $st;
</pre>

<p>This will not work for <code>-t</code>, <code>-T</code>, and <code>-B</code> because these can not be determined from the results of a core <code>stat()</code> call.</p>

<p>In addition, <code>File::stat</code> versions 1.02 and above support a <code>cando()</code> method as an alternate implementation to the file access tests <code>-r</code>, <code>-w</code>, <code>-x</code>, <code>-R</code>, <code>-W</code>, and <code>-X</code>. This method takes two arguments. The first is one of the <a href="https://perldoc.perl.org/Fcntl.html"><code>Fcntl</code></a> constants <code>S_IRUSR</code>, <code>S_IWUSR</code>, or <code>S_IXUSR</code>, and the second is a Boolean which selects the effective UID (if true) or the real UID (if false).</p>

<p>Unfortunately, There Ain't No Such Thing As A Free Lunch. There are a few things to be aware of if you use this module:</p>

<ul>
    <li>The <code>stat()</code> and <code>lstat</code> functions provided by this module no longer make implicit use of the topic variable <code>$_</code>. Fortunately, calls of these without arguments become syntax errors, and you can always supply <code>$_</code> as an explicit argument.</li>
    <li>The <code>stat()</code> and <code>lstat</code> functions provided by this module no longer interact with special file handle <code>_</code>. Fortunately, calls of (e.g.) <code>stat _</code> are an error if <code>use strict 'subs';</code> is in effect. <strong>Note</strong> that you <strong>can</strong> still use explicit file handles.</li>
    <li>This module's overrides of the file access operators ignore the <a href="https://perldoc.perl.org/filetest.html"><code>filetest</code></a> pragma -- with a warning if <code>use filetest 'access';</code> is in effect. You can, of course, still get this functionality, but you will have to test the original file name.</li>
</ul>

<p>And of course, you can always access the overridden functions if you need to by calling <code>CORE::stat()</code> or <code>CORE::lstat()</code>.</p>

<p>Previous entries in this series:</p>

<ol>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2021/09/my-favorite-modules-if.html"><code>if</code></a></li>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2021/10/my-favorite-modules-diagnostics-one.html"><code>diagnostics</code></a></li>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2021/11/my-favorite-modules-termreadlineperl.html"><code>Term::ReadLine::Perl</code></a></li>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2022/02/my-favorite-modules-re.html"><code>re</code></a></li>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2022/02/my-favorite-modules-develnytprof.html"><code>Devel::NYTProf</code></a></li>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2022/05/my-favorite-modules-errno.html"><code>Errno</code></a></li>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2022/05/my-favorite-modules-timepiece.html"><code>Time::Piece</code></a></li>
    <li><a href="https://blogs.perl.org/users/tom_wyant/2022/06/core-modules-filetest.html"><code>filegtest</code></a></li>
</ol>
]]>
        
    </content>
</entry>

</feed>
