Tom Wyant

Ordering Your Tests

2023-02-24T22:20:14Z

By default, the test actions of both ExtUtils::MakeMaker and Module::Build test t/*.t in lexicographic order (a.k.a. ASCIIbetical order). Under this default, some Perl module authors who want tests performed in a given order have resorted to numbering tests: t/01_basic.t, t/10_functional.t, and so on.

My personal preference is to take the lexicographic ordering into consideration when naming test files: t/basic.t through t/whole_thing.t. But the price of this choice is a certain number of contrived test names, and even the occasional thesaurus lookup.

But there is a better way. Both ExtUtils::MakeMaker and Module::Build allow you to specify tests explicitly.

Under ExtUtils::MakeMaker version 6.76 or above, you call WriteMakeFile() thus:

WriteMakeFile(
    ...
    test => {
        TESTS => 't/one.t t/two.t t/three.t t/four.t',
    },
    ...
);

If you do this, the tests specified (and only the tests specified) are performed in the order specified.

ExtUtils::MakeMaker version 6.76 was released September 5 2013 and shipped with Perl 5.19.4, so any reasonably modern Perl should support this.

The equivalent incantation under Module::Build version 0.23 or above is:

Module::Build->new(
    ...
    test_files => [ qw{
        t/one.t
        t/two.t
        t/three.t
        t/four.t
        } ],
    ...
)->create_build_script();

Module::Build version 0.23 was released February 9 2004.

Outstanding GitHub Items

2023-02-17T02:30:45Z

Recently I received a bump on a GitHub pull request. This surprised me, because I was unaware of anything outstanding. I was even more surprised when I discovered that the distribution in question also had two open issues, one dating back about three months.

I have no idea why I was oblivious to these, but it made me want to audit myself to see if any other distributions had the same problem. GitHub has these nice links at the top of the page, Pull requests and Issues, but these show pull requests and issues that I initiated. I found no obvious way to display pull requests or issues filed against my repositories.

Now, maybe it is just me, but I find GitHub's documentation moderately opaque. But with considerable help from Duck Duck Go, I discovered the answer: you type into the search box is:open user:. This gets you both open issues and open pull requests. If you want, you can restrict this further with is:issue (for issues) or is:pr (for pull requests). Do not leave off the user name, even if you are logged in. If you forget this you will get every open item on GitHub -- all 102 million of them as of this writing.

Now I am lazy, so I made a browser shortcut to do this for me. I don't think you get private repositories this way, but I was not worried about that. The string will have to be URI-escaped. So now if I want to audit myself I just click on https://github.com/search?q=user%3Atrwyant+is%3Aopen and see what I get.

Annotated Test2::Tools Index

2023-02-03T02:07:20Z

I have very gradually been adopting Test2::V0 as a testing tool. I had a test file that performed a group of tests inside a for loop, and discovered there were circumstances where I wanted to skip an iteration. Well, the skip() provided by Test2::Tools::Basic operates by executing last SKIP;. In the case of a labeled for this skips not only the current iteration but all subsequent iterations.

I wondered if there was a Test2::Tools plugin that did a next SKIP;, so I generated an annotated index of Test2 tools. This index reports all of them in ASCIIbetical order, with the distribution they are found in and the abstract from the =head1 NAME section of the POD.

I found 44 tools after eliminating a few helper classes that lived in the same name space. None of the 44 appears to do what I want. It would be easy enough to create such a tool, but I doubted that anyone would use it but me. So I indented another level and stuck a SKIP: block inside the for loop.

Like the previous Annotated Perl::Critic Policy Index this will be updated approximately weekly. That is, a cron job runs Friday morning, and I push the repository when I get around to it, after reviewing the change and coming up with (I hope) a descriptive commit message.

My Favorite Modules: PerlIO::via

2023-01-25T14:55:34Z

OK, I confess: PerlIO::via is not a module that I use every day. It allows you, easily, and with minimal code, to modify an I/O stream before it gets to the reader of the stream. or after the writer has written it. All you do is write (say) My::Module conforming to the parts of the PerlIO::via interface you need, and provide it to the second argument of open() or binmode() as ':via(My::Module)'. How cool is that? And how cool is a language that lets you do that with a minimum of fuss, bother, and code?

I encountered this when trying to modify (OK, hack) the behavior of a large and complex hunk of Perl not under my control. Rummaging around in this turned up the fact that all file input went through a single module/object, which had an open() method. I realized if I could insert my own PerlIO layer into the input stream, I would have control over what the ~~victim~~ host code saw.

In the true spirit of the Conan the Barbarian school of programming ("Bash it until it submits!") I wrote a PerlIO::via module whose import() method monkey-patched the open() to insert my layer into the stack. All I had to do was launch the host code with -MMy::Module and the dirty deed was done.

If you read the PerlIO::via documentation you see a whole host of methods you can provide. All I wanted to do was modify the input stream, and that can be done by implementing just two or three:

You will have to provide PUSHED(), which is called when your layer is pushed onto the I/O stack. That is, when someone specifies it in the second argument of open() or binmode(). This is called as a static method, and given a fopen()-style mode string (i.e. 'r', 'w', or what have you) and the already-opened handle, which represents the layer below. This method needs to instantiate and return an object of the given class. Depending on your needs, this can be as simple as

sub PUSHED {
    my ( $class ) = @_;
    return bless {}, $class;
}

You have a couple options for how to get the input, but I opted for FILL(). This is called as a method, and passed a file handle which is open to the next layer down in the PerlIO stack. This would look something like:

sub FILL {
    my ( $self, $fh ) = @_;
    defined( my $data = <$fh> )
        or return;

    # Do your worst to the $data

    return $data;

}

A few paragraphs back I said "two or three" methods. For a while I was content with the above two. But then I realized that the caller was getting back bytes even if the file was opened with :encoding(...) specified in a lower layer, and the FILL() method preserved the character-nature of the data. Wrestling with this finally drove me back to the documentation, where I found the UTF8() method.

The UTF8() method is optional, and is called (if it exists) right after PUSHED(). It receives one argument, which is interpreted as a Boolean, and is true if the next-lower layer provides characters rather than bytes. The returned value tells PerlIO whether your layer provides characters (if true) or bytes (if false). A minimal-but-sufficient implementation is

sub UTF8 {
    my ( undef, $below_flag ) = @_;
    return $below_flag;
}

Caveat: If you apply the encoding and your layer in the same operation (e.g. binmode $fh, ':encoding(utf-8):via(My::Module)';, the UTF8() method will not see a true value of $below_flag. There are two ways of dealing with this:

Apply your PerlIO::via layer in a separate call to binmode(), or
Specify an explicit :utf8 after your layer (that is, binmode $fh, ':encoding(utf-8):via(My::Module):utf8';).

This is already a longer note than I like, but I have to say something about :utf8. The current documentation calls it a pseudo-layer. What it really is is a bit on the layer below, telling PerlIO that the layer it applies to provides characters rather than bytes on input, or accepts characters on output. Around Perl 5.8 or 5.10 there was a fair amount of misunderstanding about what :utf8 did, and there was actually core Perl documentation that said (or seemed to say) that you did UTF-8 I/O by specifying this layer. Most such instances of :utf8 in the core documentation have been replaced by :encoding(utf-8) but there may still be some :utf8 in outlying regions of the documentation.

By using :utf8 in the second example above, what I am telling Perl is that :via(My::Module) produces decoded output. It does, because the layer below it (:encode(utf-8)) does, and :via(My::Module) preserves this property. Without the :encode(utf-8) below it it would be an error to tell PerlIO that :via(My::Module) produced characters unless My::Module did the decoding itself.

If you want to see what layers are in effect on file handle $fh, you can call PerlIO::get_layers( $fh ). This returns a list, which will include :utf8 as a separate entry, maybe more than once if more than one layer has that bit set.

Previous entries in this series:

Regexp Delimiters

2023-01-17T01:36:01Z

Perl lets you use almost anything as a regular expression delimiter. It is usual to use punctuation of some sort, but characters that match /\w/ can be used provided there is white space between the operator and the delimiter: m X foo Xsmx compiles and matches 'foobar'. In the presence of use utf8; you can go wild.

A query on the Perl 5 Porters Mailing List (a.k.a. 'p5p') a few days ago asked for opinions about appropriating the colon (':') as a delimiter for modifiers to the regular expression operators. This got me wondering about what regular expression delimiters were actually in use.

I scratched that itch by plowing through my local Mini CPAN, running everything that looked like Perl through PPI, and checking anything that parsed to an object of one of the relevant classes. A summary of the results is appended.

It was no surprise that "/" was the overwhelming favorite. The colon (":") came in 13th. I was a little surprised (after I thought about it) not to see "'" (7th) more popular, since it does not interpolate. After all, why write m/[\@\$]/ when you can write m'[@$]'?

You made it to the end of this post. Your prize (if you want to call it that) is the threatened list of regular expression delimiters, in decreasing order of frequency. The delimiters themselves were formatted by running them through B::perlstring(). I suspect most of the single-digit ones are the result of mis-parses, but believe it or not, some of the instances of "\\" are real regular expression delimiters.

"/"       1420735
"{"       128788
"!"       36081
"|"       23932
"#"       14893
"("       7369
"'"       5180
"["       4220
","       3376
"<"       2926
"%"       2308
"\@"      1302
":"       1232
"\""      828
"."       349
"~"       313
"-"       249
";"       194
"?"       182
"="       109
"^"       59
"0"       43
"`"       35
"+"       29
")"       18
"&"       17
"o"       15
"n"       14
"]"       14
"r"       13
"*"       11
"\\"      11
"\036"    8
"i"       6
"\$"      6
"\a"      6
""        5
"e"       4
">"       4
"1"       4
"8"       3
"S"       3
"6"       3
"9"       3
"_"       2
"f"       2
"a"       2
"}"       2
"g"       2
"m"       2
"5"       2
"v"       1
"q"       1
"l"       1
"I"       1
"d"       1
"M"       1
"c"       1
"s"       1
"t"       1
"H"       1
"\247"    1
"u"       1
"x"       1

Making GitHub CI work with Perl 5.8.

2022-12-01T13:48:01Z

A while back. I got a pull request from Gabor Szabo adding a GitHub action to one of my distributions. I have been working with this, but have not (so far) blogged about it because, quite frankly, I am still not sure I know what I am doing.

One of my personal desires was to test my distributions on the oldest practicable Perl for each available architecture. For Unix (i.e. Linux and macOS) this is 5.8.8, provided the distribution itself supports that. A couple days ago, though, I pushed a modification to one of my distributions and had the 5.8.8 tests blow up.

The problem turned out to be that Module::Build, for reasons I have not investigated, has Pod::Man as a dependency. The current version of Module::Build requires Pod::Man version 2.17, but according to corelist Perl 5.8.8 comes with Pod::Man version 1.37, so cpanm wants to upgrade it.

The problem with this is that as of version 5.0 released November 25 2022, the podlators distribution, which supplies Pod::Man, requires Perl 5.10. So under 5.8.8, cpanm --with-configure --notest --installdeps . dies trying to install podlators.

The solution I came up with was to pre-emptively install RRA/podlators-4.14.tar.gz under Perl 5.8.8. The implementation was in two parts: define an environment variable that recorded whether we were running under Perl 5.10, and define a job step conditioned on that variable to install podlators 4.14 if we were using an earlier Perl.

Under GitHub Actions you can define environment variables by appending their definitions to the file whose path is in environment variable GITHUB_ENV. After struggling with PowerShell for the Windows runners, I decided to do that step in Perl. The core of the Perl script is:

defined $ENV{GITHUB_ENV}
    and $ENV{GITHUB_ENV} ne ''
    or die "Environment variable GITHUB_ENV undefined or empty\n";
open my $fh, '>>:encoding(utf-8)', $ENV{GITHUB_ENV}
    or die "Can not open $ENV{GITHUB_ENV}: $!\n";

my $home = File::HomeDir->my_home();

my $is_5_10 = "$]" >= 5.010 ? 1 : '';

my $is_windows = {

    MSWin32	=> 1,

    dos		=> 1,

}->{$^O} || '';

my $is_unix = $is_windows ? '' : 1;

print $fh <<"EOD";

MY_HOME=$home

MY_IS_UNIX=$is_unix

MY_IS_WINDOWS=$is_windows

MY_PERL_IS_5_10=$is_5_10

EOD

Next I had to run this from the YAML file that defined the workflow, and act on the created value. This was done using two steps:

    - name: Customize environment
      run: |
        cpanm -v
        cpanm File::HomeDir
        perl .github/workflows/environment.PL

and

    - name: Install old podlators distro if on old Perl
      if: "! env.MY_PERL_IS_5_10"
      run: cpanm RRA/podlators-4.14.tar.gz

The entirety of both the GitHub Actions file ci.yml and the Perl script environment.PL can be found in the GitHub repository for Astro::Coord::ECI. Other, and probably better, implementations can be imagined.

Match Anything, Quickly -- Revision 1

2022-09-02T18:17:31Z

O wad some Power the giftie gie us
To see oursels as ithers see us!
It wad frae mony a blunder free us,
An' foolish notion: ...

My previous blog post, Match Anything, Quickly, brought a number of responses which are worth reading in their own right. The one that triggered this post, though, was from Nerdvana and Devin of Cincinnati Perl Mongers, who pointed out an error in my benchmark script. I had left off the intended /smx from the qr/ ... / version of the test, which meant that the regular expression did not in fact match.

Three cheers for code reviews!

The Cincinnati Perl Mongers came up with a further case which combines my two:

eval "do { my \$regex = qr/ $re /smx; " .
        "sub { \$MATCH =~ /\$regex/o }};"

They benchmarked this as being slightly slower than the case where the regular expression is simply interpolated into the subroutine verbatim.

Interestingly (to me, at least) they reported that the removal of the /o modifier made their case 2-3 times slower. This surprised me somewhat, as I had understood that modern Perls (for some value of "modern") had done things to minimize the performance difference between the presence and absence of /o.

For the record, the corrected script is also on GitHub. The corrections include an option that tests to make sure all benchmarked things actually match. The result of running this with the --test and --html options (on a different machine than the original post) is:

ok 1 - sub { 1 }
ok 2 - sub { $MATCH =~ m/ (*ACCEPT) /smx }
ok 3 - qr/ (*ACCEPT) /smx
ok 4 - sub { $MATCH =~ m/ (?) /smx }
ok 5 - qr/ (?) /smx
ok 6 - sub { $MATCH =~ m/ (?:) /smx }
ok 7 - qr/ (?:) /smx
ok 8 - sub { $MATCH =~ m/ .? /smx }
ok 9 - qr/ .? /smx
ok 10 - sub { $MATCH =~ m/ .{0} /smx }
ok 11 - qr/ .{0} /smx
ok 12 - sub { $MATCH =~ m/ \A /smx }
ok 13 - qr/ \A /smx
ok 14 - sub { $MATCH =~ m/ ^ /smx }
ok 15 - qr/ ^ /smx
1..15

Implementation	Rate
sub { 1 }	434782608.70/sec
sub { $MATCH =~ m/ \A /smx }	13333333.33/sec
sub { $MATCH =~ m/ ^ /smx }	13315579.23/sec
sub { $MATCH =~ m/ (?:) /smx }	12315270.94/sec
sub { $MATCH =~ m/ (?) /smx }	11173184.36/sec
sub { $MATCH =~ m/ .{0} /smx }	10593220.34/sec
sub { $MATCH =~ m/ .? /smx }	10449320.79/sec
sub { $MATCH =~ m/ (*ACCEPT) /smx }	4380201.49/sec
qr/ ^ /smx	2612330.20/sec
qr/ \A /smx	2603488.67/sec
qr/ (?:) /smx	2586652.87/sec
qr/ (?) /smx	2575991.76/sec
qr/ .{0} /smx	2518891.69/sec
qr/ .? /smx	2510670.35/sec
qr/ (*ACCEPT) /smx	1849796.52/sec

Match Anything, Quickly

2022-08-06T00:39:25Z

Revision: that Cincinnati Perl Mongers found an error in the benchmark script used for this post. Match Anything Quickly - Revision 1 discusses their findings and links to a revised benchmark script. -- TRW 2022-09-02

Sometimes I want to filter a set of strings, but the details of the filter are not known beforehand. In particular, I may want a null filter, which simply accepts anything.

This looks like a job for a regular expression, but I can think of at least two implementations. One is to pass around regular expression objects. The second is to wrap a match (m//) in a subroutine reference, and pass that around. Given the use of regular expressions, there are a number of possibilities for a regular expression that matches any string.

I wondered whether one of the alternatives I was choosing among was faster than another, so I decided to Benchmark them. Both implementations applied the regular expression to a global variable. In practice this would probably be a localized $_, but my read of the Benchmark module says that it also localizes $_, but leaves it undef.

Note that the empty pattern is not benchmarked, because it is equivalent to the last successfully-matched pattern, if any. The sub { 1 } was included because if we're dealing in code references, the null filter simply needs to return a true value.

Here are the results, obtained with Perl 5.36.0, unthreaded. The script that generated them is on GitHub

Implementation	Rate
sub { 1 }	294117647.06/sec
sub { m/ .? /smx }	21645021.65/sec
sub { m/ .{0} /smx }	21598272.14/sec
sub { m/ (*ACCEPT) /smx }	20964360.59/sec
sub { m/ (?) /smx }	20876826.72/sec
sub { m/ \A /smx }	20746887.97/sec
sub { m/ (?:) /smx }	20618556.70/sec
sub { m/ ^ /smx }	20618556.70/sec
qr/ (?) /smx	2344665.89/sec
qr/ (?:) /smx	2344116.27/sec
qr/ ^ /smx	2336448.60/sec
qr/ \A /smx	2315350.78/sec
qr/ .? /smx	2208968.41/sec
qr/ .{0} /smx	2180074.12/sec
qr/ (*ACCEPT) /smx	1717327.84/sec

Somewhat to my surprise, the subroutine-reference implementation was an order of magnitude faster than the regular-expression-reference implementation. I expected that, Regexps being first-class objects, it would be pretty much equivalent to m/ ... / wrapped in a subroutine -- maybe even a little faster.

A little messing around with perl -MO=Concise got me the following:

$ perl -MO=Concise -e '$_ =~ m/foo/;'
5  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter v ->2
2     <;> nextstate(main 1 -e:1) v:{ ->3
4      match(/"foo"/) vKS ->5
-        <1> ex-rv2sv sK/1 ->4
3           <$> gvsv(*_) s ->4
-e syntax OK
$ perl -MO=Concise -e '$_ =~ qr/foo/;'
7  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter v ->2
2     <;> nextstate(main 1 -e:1) v:{ ->3
6      match() vKS ->7
-        <1> ex-rv2sv sK/1 ->4
3           <$> gvsv(*_) s ->4
5        <|> regcomp(other->6) sK ->6
4            qr(/"foo"/) s ->5
-e syntax OK

The salient difference, to my eye, was the presence of the regcomp operator in the second case. perldoc-search on this led me eventually to perlreapi which says, in part,

"precomp" "prelen"

Used for optimisations. "precomp" holds a copy of the pattern that was compiled and "prelen" its length. When a new pattern is to be compiled (such as inside a loop) the internal "regcomp" operator checks if the last compiled "REGEXP"'s "precomp" and "prelen" are equivalent to the new one, and if so uses the old pattern instead of compiling a new one.

The relevant snippet from "Perl_pp_regcomp":



            if (!re || !re->precomp || re->prelen != (I32)len ||

                memNE(re->precomp, t, len))

            /* Compile a new pattern */

So I assume that the speed difference might be reduced if the filter was called in a tight enough loop. But if so, the Benchmark loop is not tight enough, and it's pretty tight. On the other hand, maybe the Benchmark loop is tight enough, and the extra time is spent determining that a recompilation is not needed. But it will take deeper knowledge of Perl internals than I possess to sort this out.

Numeric Variable Names With Leading Zeroes

2022-07-26T16:52:03Z

Over on the p5p mailing list, a user raised the issue that use of variable $00 is an error starting with Perl 5.32, and asked that this "regression" be fixed.

I have always understood that variables whose names begin with anything but an alphabetic or an underscore are reserved to Perl, and you mess with them at your peril. And this is the gist of the Porters' response to the post. Recent versions of perlvar say this explicitly, though earlier versions of that document restrict themselves to describing currently-implemented special variables.

For what it's worth, perl532delta appears not to mention this as a new diagnostic.

I wondered how much of this kind of thing was in CPAN, so I whipped up a Perl::Critic policy to try to find them: Variables::ProhibitNumericNamesWithLeadingZero. I then ran this against CPAN as it stood July 23 2022.

The only violation of this policy that I found was in line 1209 of Net::Elexol::EtherIO24. The most recent release of this module (as of this writing) is August 11 2009. The line in violation (in context) is

1208    $txt .= sprintf("MAC: %02.2x:%02.2x:02.2x:02.2x:02.2x:02.2x  ".
1209                    "Fw: %02.2x.$02.2x",
1210                    unpack("x$len CCCCCCCC", $cmd));

and looks to me very much like a typo for %02.2x. The distribution requires a threaded Perl, and CPAN Testers show failures with Error: Numeric variables with more than one digit may not start with '0' at Net-Elexol-EtherIO24-0.22-0/blib/lib/Net/Elexol/EtherIO24.pm line 1209. for Perl versions 5.32.1 and above. There are no reports for 5.32.0.

Under the circumstances I can not imagine anyone (other than maybe the original poster on p5p) actually wanting this perlcritic policy published, but I did stick it on GitHub for the curious.

Sorting Subroutine Results

2022-07-20T14:26:27Z

The Perl sort built-in is mostly (at least by me) called as sort LIST or sort BLOCK LIST. But there is a third way to call it: sort SUBROUTINE LIST, which actually appears first in the documentation.

This is not a blog entry about using the sort SUBROUTINE LIST form of sort. It is more about the need to be aware of this form when writing (or trying to write) the sort LIST form.

Consider the following situation: you have a subroutine foo() which returns an un-ordered list. You need that list sorted. Perl has a sort built-in, so your (or at least my) first reaction is to write my @sorted = sort foo();, run it, and then wonder why @sorted is empty.

The problem, of course, is that Perl parses this as sort SUBROUTINE LIST with the SUBROUTINE being foo and the LIST being everything after foo. The contents of the parentheses (if any) are not passed as arguments to foo(), but are consumed by the sort. Subroutine foo() gets called only to order pairs of items in the LIST.

If you actually want to sort the list returned by foo(), you have to persuade Perl not to parse sort foo() as sort SUBROUTINE LIST. The documentation contains the words Warning: syntactical care is required when sorting the list returned from a function, and provides ways to make this happen. They are, basically,

Provide a sort block, e.g. sort { $a cmp $b } foo()
Use a unary plus, e.g. sort +foo()
Call the function with an ampersand, e.g. sort &foo()
Call sort as a function, e.g. sort( foo() )

Which of these you choose is largely a matter of style. I believe a sort block imposes a performance penalty, but whether this is significant depends on the application.

Scalar Context: Lists Versus Arrays

2022-07-12T18:54:31Z

For a long time after I first encountered Perl, I looked on "list" and "array" as essentially interchangeable concepts. A list was simply the source construct corresponding to an array. This idea is mostly correct. But as they say, the devil is in the details.

One of the differences is what happens to them in scalar context. An array evaluates to the number of elements it contains. A list evaluates to its last element. So:

my @array = qw{ one two five };
say scalar @array;  # prints '3'
{
    no warnings 'void'; # Note the need for this
    say scalar( qw{ one two five } ); # prints 'five'
}

Okay, that is a trivial example. It becomes more interesting when you consider that subroutines inherit their calling context. If called in scalar context, a subroutine that returns a list behaves differently than one that returns an array:

sub array {
    state $array = [ qw{ one two five } ];
    return @{ $array };
}
sub list {
    return qw{ one two five };
}
say scalar array(); # prints 3
say scalar list();  # prints 'five';

Now, there is some sentiment against subroutines that "behave differently" in scalar and list context. Usually this is thought of in terms of the wantarray() built-in, and there is actually Perl Critic policy Perl::Critic::Community::WantArray to flag these.

But it seems to me that any Perl subroutine that returns more than one value will behave differently in scalar context: it's just a question of whether you want the array behavior, the list behavior, or the arbitrary behavior you can get with wantarray(). The difference between good code and bad code is a matter of choosing this behavior carefully.

P.S.

What do you do if you have an array but want list behavior? There is no list built-in corresponding to the scalar built-in. The documentation for scalar talks about this, but only addresses interpolation. In the general case, though, what seems to work is slicing the entire array:

say scalar @array[ 0 .. $#array ]; # prints 'five'

Or, if you want to encapsulate this behavior,

sub make_list { return @_[0..$#_] }
say scalar make_list( qw{ one two five } ); # prints 'five';

No, I did not come up with this on my own. I got it from Stack Overflow, specifically from user2404501's response.

Be careful of getting too fancy with this. scalar @array[ 0 .. $#array ] is written much more clearly as $array[-1].

Announcing perlcritic Policy ValuesAndExpressions::ProhibitFiletest_rwxRWX

2022-07-05T16:21:43Z

Since several places in the Perl documentation caution against the use of the file access operators (-r and friends), and since I was unable to find a Perl::Critic policy dealing with this, I thought I would make one: Perl::Critic::Policy::ValuesAndExpressions::ProhibitFiletest_rwxRWX.

This policy is assigned to the 'bugs' theme. It has low severity because there are some uses of these operators that seem legitimate to me -- or at least I see no easy way to get around their use.

On the one hand, something like

-r $file or die "File $file not readable\n";
open my $handle, '<', $file;

is wrong several ways. On the other hand, it is hard to see how to implement File::Which without the use of -x. And in fact it does use -x.

This policy has no configuration options. I can imagine a configuration option to allow some file access operators, but was unsure how much actual need there is for such an option. A configuration option to allow file access operators within the scope of a use filetest 'access'; might be possible, but would certainly make the policy much more complex.

Maybe this policy should be in the ::BuiltinFunctions:: name space, but I decided to follow the precedent established by Kevin Ryde in his Perl::Critic::Policy::ValuesAndExpressions::ProhibitFiletest_f.

Smart Match in CPAN

2022-06-29T23:15:18Z

There is nothing like looking, if you want to find something. -- The Hobbit, iv, "Over Hill and Under Hill"

Recently on the p5p mailing list the topic of removing smart match re-surfaced. There was a fairly vigorous discussion about the effect this would have on CPAN. So I thought I would look into how many uses there actually were.

Fortunately there are Perl Critic policies for this: Jan Holčapek's Perl::Critic::Policy::ControlStructures::ProhibitSwitchStatements and Perl::Critic::Policy::Operators::ProhibitSmartmatch. All I had to do was run them against my mini-CPAN.

My results:

Total distributions: 40704
Distributions with violations: 842
Files with violations: 1568

A look at the file names involved says that about two-thirds of the violations are in the published modules themselves, and the rest are in support code (directories t/, inc/, and the like).

It is possible that the results of Perl::Critic::Policy::ControlStructures::ProhibitSwitchStatements contain false positives simply because someone implemented subroutines named given() or when() unrelated to smart matching.

It is hard for me to see how there could be false positives from Perl::Critic::Policy::Operators::ProhibitSmartmatch, though I have learned long since that reality exceeds my ability to imagine it.

Given the nature of Perl, false negatives may have to be detected on a case-by-case basis. I do know that when smart match was briefly removed in a development release a few years back only one module that I use broke, and I had an alternative for it.

The mini-CPAN repository used for analysis was most recently updated 2022-06-24 08:10Z. The configuration file is

remote: https://www.cpan.org/
local: 
exact_mirror: 0
skip_perl: 1
dirmode: 0755
path_filters: /Mail-DeliveryStatus-BounceParser-\d

I have unpublished modules in this repository, but they were excluded from the analysis. Also excluded were a few other modules that I have had trouble running Perl Critic against in the past:

CMORRIS/Parse-Extract-Net-MAC48-0.01.tar.gz
DOLMEN/Number-Phone-FR-0.0917215.tar.gz
GSLONDON/Parse-Nibbler-1.10.tar.gz

A list of the distributions containing violations is at https://trwyant.github.io/misc/smart-match-in-cpan/distros-with-violations.txt.

An ugly JSON file containing the results of the critique is at https://trwyant.github.io/misc/smart-match-in-cpan/smart-match.json. By "ugly" I mean non-pretty, non-canonical. This file encodes a hash whose top-level keys are:

asof - The ISO time the analysis was run;
critique - A hash reference containing the results of the critique (see below);
policy - An array reference containing the fully-qualified names of the policies used to critique the code.

The critique is a set of nested hashes keyed by author name, distribution name, and file name relative to the base directory of the distribution. The value for each file is a reference to an array containing the the violations for that file: line number, column number, policy violated, violation description, and violation explanation. For brevity's sake files without violations are omitted from the output.

Annotated Perl::Critic Policy Index

2022-06-24T18:23:49Z

In the wake of my postings on the file access tests (-r and friends) I wondered if there was a Perl::Critic policy to find them. So I constructed an annotated index of Perl Critic policies. Because of its size I stuck it on GitHub rather than in-line to this blog post.

This index assumes that any CPAN module whose name begins with Perl::Critic::Policy:: is a Perl Critic Policy. The index entry for each module contains the name of the module itself (linked to Meta::CPAN), the name of the distribution which contains it, and the abstract for the module if it contains anything other than a repeat of the module name. I suppose the module description could have been added, but I hoped the abstract would be sufficient.

This operation gave me 341 policies. I did not find the policy I wanted among them. In fact, only Kevin Ryde's Perl::Critic::Policy::ValuesAndExpressions::ProhibitFiletest_f came close.

For those who want context, the relevant blog posts are:

My Favorite Modules: File::stat

2022-06-17T01:23:21Z

File::stat overrides the core stat() and lstat() functions. Instead of arrays, the new functions return an object having methods corresponding to the elements of the arrays returned by the original functions. This module has been in core since Perl 5.004.

The advantage of this module is clearer code. For example, to get the size of file $file without it is something like

    my $size = ( stat $file )[7];

But with this module the same effect is given by

    my $size = stat( $file )->size();

Once you have the object in hand, you cam query it for any of its properties, so if you want both size and modification time, instead of

    my ( $size, $mtime ) = ( stat $file )[ 7, 9 ];

you can say

    my $st = stat $file;
    my $size = $st->size();
    my $mtime = $st->mtime();

Starting with File::stat version 1.02 (which ships with Perl 5.12) the returned object overloads the file test operators (-X), so that the above example could be extended by something like

    my $mine = -o $st;

This will not work for -t, -T, and -B because these can not be determined from the results of a core stat() call.

In addition, File::stat versions 1.02 and above support a cando() method as an alternate implementation to the file access tests -r, -w, -x, -R, -W, and -X. This method takes two arguments. The first is one of the Fcntl constants S_IRUSR, S_IWUSR, or S_IXUSR, and the second is a Boolean which selects the effective UID (if true) or the real UID (if false).

Unfortunately, There Ain't No Such Thing As A Free Lunch. There are a few things to be aware of if you use this module:

The stat() and lstat functions provided by this module no longer make implicit use of the topic variable $_. Fortunately, calls of these without arguments become syntax errors, and you can always supply $_ as an explicit argument.
The stat() and lstat functions provided by this module no longer interact with special file handle _. Fortunately, calls of (e.g.) stat _ are an error if use strict 'subs'; is in effect. Note that you can still use explicit file handles.
This module's overrides of the file access operators ignore the filetest pragma -- with a warning if use filetest 'access'; is in effect. You can, of course, still get this functionality, but you will have to test the original file name.

And of course, you can always access the overridden functions if you need to by calling CORE::stat() or CORE::lstat().

Previous entries in this series: