CPAN Cleaning Day 2457044: Compiler::Lexer
In my quest to clean up my CPAN distributions and to normalize them, I've been working on CPAN::Critic, my unreleased tool that looks at my directory and complains about things I don't like. That's going nicely for the most part, but I ran into a small problem that's taken me down a bit of a rabbit hole of C++. All this in the same week I wrote a Python program.
I want to check that the minimum version of Perl the code requires is the same as the MIN_PERL_VERSION
. We have two modules to do that, the PPI-based Perl::MinimumVersion and the Compiler::Lexer-based Perl::MinimumVersion::Fast.
However, I'm in love with postfix dereferencing, a v5.20 feature. PPI reports 5.004 and Compiler::Lexer thinks 5.008 because neither handle the latest syntax enhancements:
#!/Users/brian/bin/perls/perl5.20.0 use v5.20;use Perl::MinimumVersion;
use Perl::MinimumVersion::Fast;my $code = '$array_ref->@*';
my $version = Perl::MinimumVersion->new( \$code )->minimum_version;
say "min version is $version"; # 5.004my $fast = Perl::MinimumVersion::Fast->new( \$code )->minimum_version;
say "min version is $fast"; # 5.008
I started looking at Compiler::Lexer to see what I would need to do to add postfix dereferencing. If someone wants to take this on as a grant proposal for TPF, that would work best for me. Aside from that, I dusted off one of my C++ books (or, more correctly, found the moving box it never left) and got to work. Instead of fooling with the XS stuff and the Perl interface, I decided to write a short C++ program to dump the tokens.
#include#include #include typedef Lexer * Compiler_Lexer;
int show_token( Token *token );int main() {
const char *filename = "foo.pl";
bool verbose = false;
Lexer *lexer = new Lexer(filename, verbose);const char *script = "$array->@*";
Tokens *tokens = lexer->tokenize( (char *)script);size_t size = tokens->size();
for (size_t i = 0; i < size; i++) {
Token *token = tokens->at(i);
show_token( token );
}return 0;
}int show_token( Token *token ) {
printf(
"------------\nStype: %d\nType: %d\nKind: %d\nName: %s\nData: %s\n",
token->stype,
token->info.type,
token->info.kind,
token->info.name,
token->_data
);
return 1;
}
That took me much longer than I'd like to admit. I wasted most of that time just getting a C++ environment working. I haven't touched the language in over a decade. I kept typing my
in front of variable names, leaving off parens, and ending lists in commas.
That's where I am now as I dig into the code to see how it parses and guesses what is going on. After it sees the ->
token, I have to tell it that a following sigil starts a dereference, and then parse all the stuff, including possible slices, that come next.
Leave a comment