Perl 6 Core Hacking: Where's Da Sauce, Boss?
Read this article on Perl6.Party
Imagine you were playing with Perl 6 and you came across a buglet or you were having some fun with the Perl 6 bug queue—you'd like to debug a particular core subroutine or method, so where's the source for it at?
Asked such a question, you might be told it's in Rakudo compiler's GitHub repository. Depending on how deep down the rabbit hole you wish to go, you may also stop by NQP's repo, which is a subset of Perl 6 that's used in Rakudo, or the MoarVM's repo, which is the leading virtual machine Perl 6 runs on.
The answer is fine, but we can do better. We'd like to know exactly where da sauce is.
Stick to The Basics
The most obvious way is to just use grep
command in the source repository.
The code is likely in src/
directory, or src/core
more specifically.
We'll use a regex that catches sub
, method
, and multi
keywords. For
example, here's our search for path
sub or method:
$ grep -nER '^\s*(multi|sub|method|multi sub|multi method)\s+path' src/core
src/core/Cool.pm:229: method path() { self.Stringy.IO }
src/core/CompUnit/Repository/Locally.pm:26: method path-spec(CompUnit::Repository::Locally:D:) {
src/core/CompUnit/Repository/AbsolutePath.pm:46: method path-spec() {
src/core/CompUnit/Repository/NQP.pm:32: method path-spec() {
src/core/CompUnit/Repository/Perl5.pm:46: method path-spec() {
src/core/CompUnit/PrecompilationStore/File.pm:93: method path(CompUnit::PrecompilationId $compiler-id,
src/core/CompUnit/PrecompilationUnit.pm:17: method path(--> IO::Path) { ... }
src/core/IO/Spec/Win32.pm:58: method path {
src/core/IO/Spec/Unix.pm:61: method path {
src/core/IO/Handle.pm:714: method path(IO::Handle:D:) { $!path.IO }
It's not too terrible, but it's a rather blunt tool. We have these problems:
- There are false positives; we have several
path-spec
methods found - It doesn't tell us which of the results is for the actual method we have in
our code. There's
Cool
,IO::Spec::Unix
, andIO::Handle
all withmethod path
in them. If I call"foo".IO.path
, which of those get called?
The last one is particularly irksome, but luckily Perl 6 can tell us where the source is from. Let's ask it!
But here's line number... So code me maybe
The Code
class from which all subs and methods inherit provides
.file
and .line
methods that tell which file that particular Code
is
defined in, including the line number:
say "The code is in {.file} on line {.line}" given &foo;
sub foo {
say 'Hello world!';
}
# OUTPUT:
# The code is in test.p6 on line 3
That looks nice and simple, but it gets more awkward with methods:
class Kitty {
method meow {
say 'Meow world!';
}
}
say "The code is in {.file} on line {.line}" given Kitty.^can('meow')[0];
# OUTPUT:
# The code is in test.p6 on line 2
We got extra cruft of the .^can
metamodel call, which returns a list of
Method
objects. Above we use the first one to get the .file
and
.line
number from, but is it really the method we were looking for?
Take a look at this example:
class Cuddly {
method meow ('meow', 'meow') {
say 'Meow meow meow!';
}
}
class Kitty is Cuddly {
multi method meow ('world') {
say 'Meow world!';
}
multi method meow ('meow') {
say 'Meow meow';
}
}
We have a method meow
in one class and in another class we have two
multi method
s meow
. How can we print the location of the last method,
the one that takes a single 'meow'
as an argument?
First, let's take a gander at all the items .^can
returns:
say Kitty.^can('meow');
# OUTPUT:
# (meow meow)
Wait a minute, we have three methods in our code, so how come we only have
two meows in the output? Let's print the .file
and .line
for both meows:
for 0, 1 {
say "The code is in {.file} on line {.line}"
given Kitty.^can('meow')[$_];
}
# OUTPUT:
# The code is in gen/moar/m-CORE.setting on line 587
# The code is in test.p6 on line 2
The second meow
gives us a sane result; it's our method defined in
class Cuddly
. The first one, however, gives us some weird file.
What's happening here is the line is referencing the proto
for the multies.
Since in this case instead of providing our own proto
we use the
autogenerated one, the referenced file has nothing to do with our code. We
can, of course, add a proto into the code, but then the line number would
still reference the proto, not the last meow
method. Is there anything that
we can do?
You .cando
It!
The Routine
class, from which both Method
and Sub
classes inherit,
provides the .cando
method. Given a
Capture
, it returns a list of
candidates that can handle it, with the narrowest candidate first in the list,
and since the returned object is a Code
, we can query
its specific .file
and .line
:
class Cuddly {
method meow ('meow', 'meow') {
say 'Meow meow meow!';
}
}
class Kitty is Cuddly {
multi method meow ('world') {
say 'Meow world!';
}
multi method meow ('meow') {
say 'Meow meow';
}
}
my $code = gather {
for Kitty.^can('meow') -> $meth {
.take for $meth.cando: \(Kitty, 'meow');
}
}
say "The code is in {.file} on line {.line}" with $code[0];
# OUTPUT:
# The code is in test.p6 on line 12
Hooray! We got the correct location of the multi we wanted. We still have
our two classes with three meow
methods total. On line 17–21 we loop
over the two meow
Method
s the .^can
metamodel call gives us. For
each of them we call the .cando
method with the Capture
that matches
the multi we want (note that we do need to provide the needed object as the
first argument of the Capture). We then .take
all found candidates to
gather
them into the $code
variable.
The first value we get is the narrowest candidate and is good 'nuf for us,
so we call the .file
and .line
on it, which gives us the location we were
looking for. Sounds like we nailed this .file
and .line
business down
rather well. Let's dive into the core, shall we?
Can't see the core files for the setting
If this is the first time you're to see the print out of the .file
/.line
for some core stuff, you're in for a surprise. Actually, we've already seen the
surprise, but you may have thought it to be a fluke:
say "{.file}:{.line}" given &say;
# OUTPUT:
# gen/moar/m-CORE.setting:29038
All of the nice, good looking
files you see in
src/core
in the repo actually get compiled into one giant file called
the "setting." My current setting is 40,952 lines long and the .line
of core
subs and methods refers to one of those thousands of lines.
Now sure, we could pop the setting open and watch our editor grind to a stuttering halt (I'm looking at you, Atom!). However, that doesn't help us find the right repo file to edit if we want to make changes to how it works. So what do we do?
A keen eye will look at the contents of the setting or at the file that generates it and notice that for each of the separate files in the repo, the setting has this type of comment before the contents of the file are inserted into the setting:
#line 1 src/core/core_prologue.pm
This means if we're clever enough, we can write a sub that translates a line number in the setting to the separate file we can locate in the repo. Here's a plan of action: we pop open the setting file and read it line by line. When we encounter one of the above comments, we make a note of which file we're in as well as how many lines deep in the setting we're currently at.
The location of the setting file may differ, depending on how you installed
Perl 6, but on my system (I use
rakudobrew
), it's in
$*EXECUTABLE.parent.parent.parent.child('gen/moar/m-CORE.setting')
, so the
code for finding the actual file that defines our core sub or method is this:
sub real-location-for ($wanted) {
state $setting = $*EXECUTABLE.parent.parent.parent.child: 'gen/moar/m-CORE.setting';
my ($cur-line-num, $offset) = 0, 0;
my $file;
for $setting.IO.lines -> $line {
return %( :$file, :line($cur-line-num - $offset), )
if ++$cur-line-num == $wanted;
if $line ~~ /^ '#line 1 ' $<file>=\S+/ {
$file = $<file>;
$offset = $cur-line-num + 1;
}
};
fail 'Were not able to find location in setting.';
}
say "{.<file>}:{.<line>}" given real-location-for &say.line;
# OUTPUT:
# src/core/io_operators.pm:17
The $wanted
contains the setting line number given to us by .line
call
and the $cur-line-num
contains
the number of the current line we're examining. We loop until the
$cur-line-num
reaches
$wanted
and return a Hash
with the results. For each line that matches our
special comment, we store the real name of the file the code is from into
$file
and store the $offset
of the first line of the code in that file.
Once done, we simply subtract the $offset
from the setting
$cur-line-num
and we get the line number in the source file.
This is pretty awesome and useful, but it's still not what I had in mind when I said we wanted to know exactly where da sauce is. I don't want to clone the repo and go to the repo and open my editor. I want to just look at code.
If it's worth doing, it's worth overdoing
There's one place where we can stare at Rakudo's source code until it blushes and looks away: GitHub. Since our handy sub gives us a filename and a line number, we can construct a URL that points to a specific file and line in the source code, like this one, for example: https://github.com/rakudo/rakudo/blob/nom/src/core/Str.pm#L16
There's an obvious problem with such an approach: the URL points to the master
branch (called nom
, for "New Object Model," in Rakudo). Commits go into
the repo daily, and unless we rebuild our Perl 6 several times a day, there's
a good chance the location our GitHub URL points to is wrong.
Not only do we have to point to a specific file and line number, we have to
point to the right commit too. On GitHub's end, it's easy: we just replace
nom
in the URL with the appropriate commit number—we just need Rakudo to
tell us what that number is.
The two dynamic variables $*VM
and $*PERL
contain some juicy information.
By introspecting them, we can locate some useful info and what looks like
commit prefix parts in version numbers:
say $*VM.^methods;
# (BUILD platform-library-name Str gist config prefix precomp-ext
# precomp-target precomp-dir name auth version signature desc)
say $*VM.version;
# v2016.06
say $*PERL.^methods;
# (BUILD VMnames DISTROnames KERNELnames Str gist compiler name auth version
# signature desc)
say $*PERL.compiler.^methods;
# (BUILD build-date Str gist id release codename name auth version
# signature desc)
say $*PERL.compiler.version;
# v2016.06.10.g.7.cff.429
Rakudo is a compiler and so we're interested in the value of
$*PERL.compiler.version
. It contains the major release version, followed by
g
, followed by the commit prefix of this particular build. The prefix is
split up on number-letter boundaries, so we'll need to join up all the bits and
split on g
. But, take a look at $*VM.version
, which is the version of the
virtual machine we're running the code on. There aren't any g
s and
commits in it and for a good reason: it's a tagged major release, and the
name of the tag is the version. The same
will occur for Rakudo on release builds, like the ones shipped with
Rakudo Star. So we'll need to check for such edge cases
and this is the code:
my $where = .Str ~~ /g/
?? .parts.join.split("g")[*-1]
!! .Str
given $*PERL.compiler.version;
given
a $*PERL
.compiler
.version
, if it contains letter g
, join up
version bits, split on g
, and the last portion will be our commit prefix; if
it doesn't contain letter g
, then we're dealing with a release tag, so we'll
take it as-is. All said and done, our code for locating source becomes this:
my $where = .Str ~~ /g/
?? .parts.join.split("g")[*-1]
!! .Str
given $*PERL.compiler.version;
say [~] 'https://github.com/rakudo/rakudo/blob/',
$where, '/', .<file>, '#L', .<line>
given real-location-for &say.line;
# OUTPUT:
# https://github.com/rakudo/rakudo/blob/c843682/src/core/io_operators.pm#L17
Hey! Awesome! We got a link that points to the correct commit and file! Let celebrations begin! Wait. What? You followed the link and noticed the line number is not quite right? What gives? Did we mess up our algorithm?
Crank Up The Insanity
If you take a look again at the script that generates the setting file, you'll notice it strips things: comments and special backend-specific chunks of code.
There are two ways to fix this. The sane approach would be to commit a change that would make that script insert an empty line for each line it skips and then pretend that we didn't commit that just to make our personal project work. Then, there's the Zoffix Way to fix this: we got the GitHub link, so why don't we fetch that code and figure out what the right line number is. Hey! That second way sounds much more fun! Let's do just that!
The one link we've seen so far is this: https://github.com/rakudo/rakudo/blob/c843682/src/core/iooperators.pm#L17. It's not quite what we want, since it's got HTML and bells and whistles in it. We want raw code and GitHub does offer that at a slightly different URL: https://raw.githubusercontent.com/rakudo/rakudo/c843682/src/core/iooperators.pm. The plan of action then becomes:
- Get the line number in the setting
- Use our
real-location-for
sub to get the filename and sorta-right line number in a source file - Get the commit our compiler was built with
- Generate a GitHub URL for raw code for that file on that commit and fetch that code
- Use the same algorithm as in the setting generating script to convert the code we fetched into the version that lives in our setting, while keeping track of the number of lines we strip
- When we reach the correct line number in the converted file, we adjust the original line number we had by the number of lines we stripped
- Generate a regular GitHub URL to the commit, file, and corrected line number
- ???
- Profit!
I could go over the code, but it's just a dumb, unfun algorithm, and most importantly, you don't need to know it. Because... there's a module that does just that!
What Sorcery Is This?
The module is called
CoreHackers::Sourcery and when you use
it, it'll
augment
the Code
class and
all core classes that inherit from it with .sourcery
method, as well
as provide a sourcery
subroutine.
So, to get the location of the code for say
sub, just run:
use CoreHackers::Sourcery;
&say.sourcery.put;
# OUTPUT:
# src/core/io_operators.pm:20 https://github.com/rakudo/rakudo/blob/c843682/src/core/io_operators.pm#L20
That gives us the correct location of the proto
. We can either pop
open a file in a repo checkout or view the code at the provided GitHub URL.
Want to get the location
of a specific multi? There's no need to mess with .cando
! The arguments you
give to the .sourcery
method will be used to select the best matching multi,
so to find the location of the say
multi that will handle say "foo"
call,
just run:
&say.sourcery("foo").put;
# OUTPUT:
# src/core/io_operators.pm:22 https://github.com/rakudo/rakudo/blob/c843682/src/core/io_operators.pm#L22
That covers the subs. For methods, you can go with the whole .^can
meta
dance, but we like simple things, and so we'll use the subroutine form
of sourcery
:
put sourcery Int, 'abs'; # method of a type object
put sourcery 42, 'split'; # method of an Int object
put sourcery 42, 'base', \(16); # best candidate for `base` method called with 16 as arg
This is pretty handy. And the whole hitting the GitHub thing? The module will cache the code fetched from GitHub, so things like this won't take forever:
put "Int.{.name} is at {.sourcery}" for Int.^methods;
However, if you do actually run that code, after some output you'll be greeted with this error:
# Method 'sourcery' not found for invocant of class 'Method+{Callable[Bool:D]}'
# in block at test.p6 line 1
# in block <unit> at test.p6 line 1
The class it mentions is not a pure Method
object, but has a mixin in it.
While CoreHackers::Sourcery
recomposes all core subclasses of Code
class
after augmenting it,
it doesn't do that for such mixes, so you'd have to recompose them yourself:
for Int.^methods {
.WHAT.^compose;
put "Int.{.name} is at {.sourcery}" ;
}
Or better still, just use the subroutine form of sourcery
:
put "Int.{.name} is at {sourcery $_}" for Int.^methods;
Do It For Me
For most stuff, we wouldn't want to do a whole bunch of typing to
use
a module and call subs and then copy/paste URLs or filenames.
You'll notice sourcery
returns a list of two items: the filename and the URL.
This means we can make some nice and short
aliases
to call it and automatically pop open either our editor or web browser:
$ alias sourcery='perl6 -MCoreHackers::Sourcery -MMONKEY-SEE-NO-EVAL \
-e '\''run "atom", "/home/zoffix/rakudo/" \
~ EVAL "sourcery(@*ARGS[0])[0]" '\'''
$ alias sourcery-web='perl6 -MCoreHackers::Sourcery -MMONKEY-SEE-NO-EVAL \
-e '\''run "firefox", EVAL "sourcery(@*ARGS[0])[1]" '\'''
# opens Atom editor at the spot to edit code for Int.base
$ sourcery 'Int, "base"'
# opens Firefox, showing code for Int.base
$ sourcery 'Int, "base"'
We EVAL
the argument we give to these aliases, so be careful with them.
For sourcery
alias, we run
the Atom
editor and give it the file to open. I prepended the location of my
local Rakudo checkout, but you'd use yours. Most editors support opening
file:line-number
format to open files at a particular spot; if yours doesn't,
modify the command.
For sourcery-web
we use the URL returned by sourcery
and open Firefox
browser at this location. And just like that, with a few keystrokes, we
can jump in to view or edit the code for a particular core sub or method
in Rakudo!
Conclusion
We've learned where Rakudo's source lives, how to find the commit the current compiler is built off, and how to locate the source code for a particular sub or method in a giant file called the setting. We then further hacked away the inconveniences by getting to the actual place in the source code we can edit, culminating with a shiny module and a couple of handy command line aliases.
Happy hacking!
UPDATE 2016.08.05
Inspired by this blog post, lizmat++ has changed the setting generation script to not skip any lines, so making adjustments to line numbers by fetching source from GitHub is no longer necessary, as the line numbers match up with the original source.
That's cool! Happy to hear about the update!