CPAN Cleaning Day 2457044: Compiler::Lexer

In my quest to clean up my CPAN distributions and to normalize them, I've been working on CPAN::Critic, my unreleased tool that looks at my directory and complains about things I don't like. That's going nicely for the most part, but I ran into a small problem that's taken me down a bit of a rabbit hole of C++. All this in the same week I wrote a Python program.

I want to check that the minimum version of Perl the code requires is the same as the MIN_PERL_VERSION. We have two modules to do that, the PPI-based Perl::MinimumVersion and the Compiler::Lexer-based Perl::MinimumVersion::Fast.

However, I'm in love with postfix dereferencing, a v5.20 feature. PPI reports 5.004 and Compiler::Lexer thinks 5.008 because neither handle the latest syntax enhancements:

#!/Users/brian/bin/perls/perl5.20.0
use v5.20;

use Perl::MinimumVersion;
use Perl::MinimumVersion::Fast;

my $code = '$array_ref->@*';

my $version = Perl::MinimumVersion->new( \$code )->minimum_version;
say "min version is $version"; # 5.004

my $fast = Perl::MinimumVersion::Fast->new( \$code )->minimum_version;
say "min version is $fast"; # 5.008

I started looking at Compiler::Lexer to see what I would need to do to add postfix dereferencing. If someone wants to take this on as a grant proposal for TPF, that would work best for me. Aside from that, I dusted off one of my C++ books (or, more correctly, found the moving box it never left) and got to work. Instead of fooling with the XS stuff and the Perl interface, I decided to write a short C++ program to dump the tokens.

#include 
#include 
#include 

typedef Lexer * Compiler_Lexer;
int show_token( Token *token );

int main() {
const char *filename = "foo.pl";
bool verbose = false;
Lexer *lexer = new Lexer(filename, verbose);

const char *script = "$array->@*";
Tokens *tokens = lexer->tokenize( (char *)script);

size_t size = tokens->size();
for (size_t i = 0; i < size; i++) {
Token *token = tokens->at(i);
show_token( token );
}

return 0;
}

int show_token( Token *token ) {
printf(
"------------\nStype: %d\nType: %d\nKind: %d\nName: %s\nData: %s\n",
token->stype,
token->info.type,
token->info.kind,
token->info.name,
token->_data
);
return 1;
}

That took me much longer than I'd like to admit. I wasted most of that time just getting a C++ environment working. I haven't touched the language in over a decade. I kept typing my in front of variable names, leaving off parens, and ending lists in commas.

That's where I am now as I dig into the code to see how it parses and guesses what is going on. After it sees the -> token, I have to tell it that a following sigil starts a dereference, and then parse all the stuff, including possible slices, that come next.

Leave a comment

About brian d foy

user-pic I'm the author of Mastering Perl, and the co-author of Learning Perl (6th Edition), Intermediate Perl, Programming Perl (4th Edition) and Effective Perl Programming (2nd Edition).