Results matching “Planet Moose”

Planet Moose - September 2014

Welcome to Planet Moose, a brief write up on what's been happening in the world of Moose in the past month, for the benefit of those of you who don't have their eyes permanently glued to the #moose IRC channel, or the MetaCPAN recent uploads page.

If you'd like to contribute some news for next month's issue, you can do so on the wiki.

Moose

Moose 2.1212 fixes some warnings under the Perl 5.21 development branch. (See also perl RT#121638.) Moose 2.1213 fixes a memory leak throwing exceptions. If you're already using Moose 2.11xx or 2.12xx, then it's probably worth upgrading to Moose 2.1213. If you're on an older version of Moose, then also consider upgrading but first test that your stuff works with a newer Moose.

Planet Moose - August 2014

Welcome to Planet Moose, a brief write up on what's been happening in the world of Moose in the past month, for the benefit of those of you who don't have their eyes permanently glued to the #moose IRC channel, or the MetaCPAN recent uploads page.

It's now a year since the first Planet Moose. I'm surprised I've managed to keep it up for a whole twelve months. (Though it's not always been posted promptly on the first day of the month!)

If you'd like to contribute some news for next month's issue, you can do so on the wiki.

Moose

Moose 2.1211 has been released containing some documentation updates and minor changes to Moose::Exporter.

More interestingly, three Moose 2.13xx trial releases have been uploaded to CPAN. The major new feature so far is improved support for overloaded operators, especially in roles, eliminating the need for MooseX::Role::WithOverloading.

Planet Moose - July 2014

Welcome to Planet Moose, a brief write up on what's been happening in the world of Moose in the past month, for the benefit of those of you who don't have their eyes permanently glued to the #moose IRC channel, or the MetaCPAN recent uploads page.

If you'd like to contribute some news for next month's issue, you can do so on the wiki.

Moose

Moose 2.1210 has been released containing some updates to the test suite, and documentation improvements.

Planet Moose - June 2014

Welcome to Planet Moose, a brief write up on what's been happening in the world of Moose in the past month, for the benefit of those of you who don't have their eyes permanently glued to the #moose IRC channel, or the MetaCPAN recent uploads page.

If you'd like to contribute some news for next month's issue, you can do so on the wiki.

Moose

Moose 2.1208 and 2.1209 have been released containing some minor bugfixes.

Planet Moose - May 2014

Welcome to Planet Moose, a brief write up on what's been happening in the world of Moose in the past month, for the benefit of those of you who don't have their eyes permanently glued to the #moose IRC channel, or the MetaCPAN recent uploads page.

If you'd like to contribute some news for next month's issue, you can do so on the wiki.

Moose

Moose 2.1206 and 2.1207 have been released; they don't contain any jaw-dropping new features. If you're running an earlier 2.12xx release, you should probably upgrade for bugfixyness, and you shouldn't have to deal with your code breaking because of Moose changes.

Omnia illa, Quae Exspectant

Or to put in other words if you hang around long enough things will change. Well I actually did not have to wait very long at all about a month only. You may remember this post where I was recursively digging in to a directory of code to pre-load a number of Moose classes.

Well I had a solution to fix this problem


Can't locate object method "new_object" via package "Moose::Meta::Role" at /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/Moose/Object.pm line 2…

Planet Moose - April 2014

Welcome to Planet Moose, a brief write up on what's been happening in the world of Moose in the past month, for the benefit of those of you who don't have their eyes permanently glued to the #moose IRC channel, or the MetaCPAN recent uploads page.

Apologies for being a little late delivering the news this month. If you'd like to contribute some news for next month's issue, you can do so on the wiki.

Moose

Version 2.1205 is a minor update containing a small bug fix and an improved test case. One interesting new addition is the is_role function in Moose::Util.

Planet Moose - March 2014

Welcome to Planet Moose, a brief write up on what's been happening in the world of Moose in the past month, for the benefit of those of you who don't have their eyes permanently glued to the #moose IRC channel, or the MetaCPAN recent uploads page.

If you'd like to contribute some news for next month's issue, you can do so on the wiki.

Moo

Although no new version of Moo has been released so far in 2014, there have been updates in March to a few of the modules it depends on:

Planet Moose - February 2014

Welcome to Planet Moose, a brief write up on what's been happening in the world of Moose in the past month, for the benefit of those of you who don't have their eyes permanently glued to the #moose IRC channel, or the MetaCPAN recent uploads page.

If you'd like to contribute some news for next month's issue, you can do so on the wiki.

Moose

There have been a couple of minor releases of Moose this month (2.1203 and 2.1204) aimed at squelching a bug exposed by a recent update to Module::Runtime. (See Ovid's blog post on the issue.)

Perl and Me, Part 11: Please Mr. Perl, Will You DWIM?

This is part 11 of an ongoing series where I explore my relationship with Perl.  You may wish to begin at the beginning.

This week we look at trade-offs in expressivity.

Last week I told a story about a CS professor who was inexplicably afraid of recursion.  Other than the obvious moral of “don’t be afraid of things that use underlying complexity to make your external code simpler,”1 what exactly does this have to do with Perl?

We’re programmers.  Nearly every single decision we make involves a trade-off of some sort.  This way will be more efficient, but harder to maintain.  This algorithm will be faster, but use more memory.  This piece of software is more widely used, and therefore easier to find people to work on it, but feature-wise it sucks ass and can’t hold a candle to this other piece of software, which is dead brilliant but almost entirely undocumented and nobody’s ever heard of it.  If we upgrade, we risk introducing bugs and downtime from weird backcompat issues; if we don’t, we risk becoming outdated, insecure, and left behind in the marketplace.  There’s always some trade-off, and it’s our job to pick one.  We don’t always pick the right one.  In fact, most of the time there isn’t really a “right” one.  What seems right may end up being wrong, and what’s actually right today may end up becoming wrong tomorrow.  But we do the best we can.  And when the benefits outweigh the consequences, that doesn’t make the consequences go away.  It just means we have to learn to live with them.

Planet Moose - January 2014

Welcome to Planet Moose, a brief write up on what's been happening in the world of Moose in the past month, for the benefit of those of you who don't have their eyes permanently glued to the #moose IRC channel, or the MetaCPAN recent uploads page.

If you'd like to contribute some news for next month's issue, you can do so on the wiki.

Moose

There have been a couple of minor releases of Moose this month (2.1201 and 2.1202) but nothing especially exciting.

Planet Moose - December 2013

Happy new year, and welcome to Planet Moose, a brief write up on what's been happening in the world of Moose in the past month, for the benefit of those of you who don't have their eyes permanently glued to the #moose IRC channel, or the MetaCPAN recent uploads page.

If you'd like to contribute some news for next month's issue, you can do so on the wiki.

Perl Advent Calendar

There were a handful of Moose/Moo-related articles in the 2013 Perl Advent Calendar:

Planet Moose - November 2013

Welcome to Planet Moose, a brief write up on what's been happening in the world of Moose in the past month, for the benefit of those of you who don't have their eyes permanently glued to the #moose IRC channel, or the MetaCPAN recent uploads page.

If you'd like to contribute some news for next month's issue, you can do so on the wiki.

Mouse 2.0.0

Mouse has rolled over to 2.0.0, and now 2.1.0. 2.0.0 introduced a potentially backwards-incompatible change with regards to role composition, and 2.1.0 drops support for Perl 5.6.x.

Planet Moose - October 2013

Welcome to Planet Moose, a brief write up on what's been happening in the world of Moose in the past month, for the benefit of those of you who don't have their eyes permanently glued to the #moose IRC channel, or the MetaCPAN recent uploads page.

Thanks for contributions from Damien Krotkine; and thanks to Upasana Shukla for allowing me to interview her. If you'd like to contribute some news for next month's issue, you can do so on the wiki.

Moose

The big news is that Moose now throws blessed exception objects. The 2.1102 trial version is the first release with this feature. (I think MooseX::Exception::Rot13 is the first distribution to include a dependency on Moose::Exception.)

Planet Moose - September 2013

Welcome to the second edition of Planet Moose, a brief write up on what's been happening in the world of Moose this month, for the benefit of those of you who don't have their eyes permanently glued to the #moose IRC channel, or the MetaCPAN recent uploads page.

Thanks for contributions from Damien Krotkine and Stevan Little. If you'd like to contribute some news for next month's issue, you can do so on the wiki.

Planet Moose - August 2013

Welcome to the first edition of Planet Moose, a brief write up on what's been happening in the world of Moose this month, for the benefit of those of you who don't have their eyes permanently glued to the #moose IRC channel, or the MetaCPAN recent uploads page. The intention is to cover Moose and similar OO frameworks (Mouse, Moo, p5-mop), along with interesting developments to their dependencies and dependent projects.

This edition is a little sparse because it just contains the news that I happen to have noticed. If you have any items you'd like to be included in the September issue, add them to the wiki! That way it might not be so tobyink-focussed next month.

Perl Startups: Lacuna Expanse

Lately while there's been discussions of various new (for varying definitions of 'new') companies who have chosen Perl, there's not been a huge amount of discussion about how they use Perl. I contacted JT Smith of Plain Black Corporation and he graciously agreed to talk about The Lacuna Expanse, a MMORPG with a Perl backend.

On the version number succeeding Perl 5

Recently Ovid broached the topic of allowing a future release of Perl 5 with a version number greater than 6; this is not a new argument. He further explicated the problems:

I just got back from FOSDEM and heard, again, for the umpteenth time, that since Perl had 4 “major” releases (1,2,3,4) in its first few years and hasn’t had a major release since Perl 5 about 20 years ago, it’s clearly “dead”. I hear this every time I go to a conference that’s not dedicated to Perl.

Meanwhile, in the minds of, of, virtually every developer on the planet who’s not in our echo chamber, Perl 6 is, logically, the successor to Perl 5. Since even Duke Nukem Forever managed to get released prior to Perl 6, Perl 6 just isn’t being taken too seriously be many devs. In other words, they think Perl 5 is dying and its successor is DOA.

I agree with him. As I commented:

I think Perl 5 has had to live in the shadow of Perl 6 for too long. Now that its decided that they will be two different languages, I think its only fair to allow Perl to get the cache’ one gets from a major version increment.

This can end the “Perl is dead” chatter (outside the community anyway). After all why would you want to use Perl 5 when Perl 6 is going to replace it (says Joe Pythonista or Jane Rubyist).

Perl 7 is here! Its Plack and Catalyst/Dancer/Mojolicious(+Mango (what’s Mango?)), its Moo(se)? and DBIx::Class. Its CPANtesters and MetaCPAN and cpanm. There is so much the outside world has missed because they are waiting for Perl 6, which may come at some point but it still wont replace Perl 5. Lets show them its safe to come back and that there are LOTS of goodies to find when they do!

That said I want to take it even a little further. Perl has been proud of its backward compatibility, and well it should be! Code from many years ago runs well on very recent Perls. But what can be a blessing can also be a curse.

Perl 5 still supports many conventions and oddities from Perl 4 and before. There are some silly things like ' as a package separator, which most people neither use nor trip over. Then again there are some real things, like indirect objects notation, default script/handle encoding, new keywords.

Olof Stockhaus made a compelling case that Perl 5 might want to use the jump to Perl 7 to make some breaking changes. For example, among other things he suggests waiting for the p5mop project. However, I think I agree with Damien Krotkine in saying that that sounds like a great target for Perl 8 which could come soon afterwards. There are many things we wish for a future Perl, but until they are ready, there still could be some changes. I think Perl 7 could be a great oportunity to do minor, but important, breaking changes. Then let Perl 8 come in later with a MOP or a proper language spec. Either way, without a major version increase its hard to make breaking changes. Without a major version to increase, we have been stuck.

Now, some have suggested that we should drop the 5 from the version number and add it to the name. Therefore we would have Perl5 version 20.0 and Perl6 version 0.1. The problem is that this still doesn’t make the case that Perl6 won’t supplant Perl5. Those people like to make the case that Java dropped the 1 so that Java 1.6 became Java 6. However this comes back to my first point. They realized that 1.5 and 1.6 (or whatever was the actual change) was going to have major changes and should be a major release. Why they didn’t simply call it Java 2 I’m not sure. What I do know is that they weren’t trying to convince people to use Java1 version 6 when something called Java2 existed.

To comment on using the year as the version number I think there are two problems with that. First, it doesn’t really signify that Perl6 is not the successor. Secondly, what it does seem to signify is that the Perl community is going to abandon the stability that is one of its strongest suits. Yes I’m advocating some breaking changes above, but rather small ones, things that will help more than they hurt to fix. I don’t think we want to imply that there will be breaking changes once per year, we can still use feature to scope changes to Perl 7 through sub-major version releases. That said, breaking changes once a decade is probably ok.

Perhaps this is the root problem. The semantics of versioning are an important tool to inform the outside world what the state of Perl is!

Anyway here’s what I propose. Say we do implement the above, so now we have two languages Perl5 and Perl6. However, since Perl5 version 20 is mostly compatible with Perl version 5 perhaps it could be allowed to keep the name. Then it seems odd that Perl should jump versions from 5 to 20, so why not just move to the next available number 7. So now we have Perl 7 successor to Perl 5 and Perl6 a new language.

This does not signify the death of Perl 6 but the life of Perl. It just so happened that Perl 6 became its own language and started its own branch in the Perl family tree. Now that we know that the success of Perl 5 and Perl 6 no longer depend on each other, why should their version numbers?


After the break, my initial list of realistic goals for Perl 7

Optimizing compiler benchmarks (part 2)

nbody - unboxed inlined arithmetic 2x faster

In the first part I showed some problems and possibilities of the B::C compiler and B::CC optimizing compiler with an example which was very bad to optimize, and promised for the next day an improvement with "stack smashing", avoiding copy overhead between the compiler stacks and global perl data.

The next days I went to Austin to meet with the perl11.org group, which has as one of the goals an optimizing compiler for perl5, and to replace all three main parts of perl: the parser, the compiler/optimizer and the vm (the runtime) at will. You can do most of it already, esp. replace the runloop, but the 3 parts are too intermingled and undocumented.

So I discussed the "stack smashing" problem with Will and my idea on the solution.

1. The "stack smashing" problem

B::CC keeps two internal stacks to be able to optimize arithmetic and boolean operations on numbers, int IV and double NV.

The first stack, called "stack", keeps the perl operand stack, the arguments for each push/pop runtime pp_* function. The perl AST, called optree, is a bit disfunctional, as it not able to generate optimized variants on the operand types. So e.g there are integer optimized versions, used with "use integer", e.g. an i_add variant for the add operator, which are used if both operands are known to be integers or integer constants at compile-time. There are no variants for strict NV, and most importantly, there are no variants for degenerated arguments, arguments with magic. Because you can add magic at run-time, which the compiler (op.c) does not know about (which is super lame), all argument types need to be checked at run-time, and you always have to go the slow general path.

This B::CC stack, implemented in B::Stackobj keeps track of the types during the lifetime and can optimize and pessimize the used type of each stack variable. It can esp. optimize on bool, int and double, needs to exchange values on stringification and pessimizes on magic, esp. tie (a variable) and overload (an op). There is no nice picture in illguts describing operand stacks.

The second stack holds lexicals. For all perl lexical variables the same is done, the B::Stackobj::Padsv lexical stack mimics every function's PADLIST. Contrary to the stack, the access to this list is optimized by the compiler (op.c) already at compile-time in the PL_curpad array, since it defines lexicals variables which are fully known at compile-time. So each op knows the exact index into the padlist, stored in the fields "targ" or "padix". Pad types are dynamic as on the stack, but there is an additional list, the comppadnames, which holds optional type information, essentially a pointer to packagename for each typed variable. It is just unused yet. See illguts for a picture.

B::Stackobj::Padsv can use the perl type information for the packages int and double, but not yet for number and bool or CTypes nor Moose types. So my int $i already specifies an optimized integer, as an IV during the scope of use integer.

Specifying types via perl attributes, like my $i :int; would be nice also, but is technically impossible, since there is no compile-time method for attributes to be checked; only at run-time.

B::CC keeps a list of good ops (pp_ functions), where the type or at least the number of arguments or return types is known at compile time. The same information is also available in the Opcodes module on CPAN. Defined are op lists like, no_stack, skip_stack, skip_lexicals, skip_invalidate, need_curcop and for various other predefined op types needed for other optimizers or the Jit module. Does it branch, i.e. will it always return op->next or not? Does it need to call PERL_ASYNC_CHECK?

On all unknown ops or ops which need to access lexicals, the current internal B::Stackobj::Padsv lexical stack values need to refreshed, written back from the internal compiler stack to the actual values on the heap, which is PL_curpad[ix]. (sub write_back_lexicals). The same must be done for all stack variables which need to accessed by the next op (sub write_back_stack). Just not for ops, which do not access stack variables.

So there is a lot of theoretical copying - "stack smashing" - going on. But B::CC is cleverly keeping track of the stack areas which need to be written back, so in practice only the really needed values are handled. In practice only numeric and boolean optimizations operate on private c variables on the C stack, rather than on referenced heap values, either on the perl stack or in the curpad. Simple sort callbacks also. So only on unboxed numbers we need to copy the values back and force, before and after, as B::CC inlines most of these ops.

2. Benchmarks

I'll take a benchmark in which Perl is very slow compared to other scripting languages, and which does a lot of arithmetic. Because I expect the B::CC type optimizer to kick in, unboxing all the numbers, and inling most of the arithmetic.

nbody performs a simple N-body simulation of the Jovian planets. Perl is currently by far the slowest scripting language for nbody, 26 min compared to 9-18 min for ruby, php or python with n=50,000,000.

perl = perl5.14.2-nt (non-threaded, -Os -msse4.2 -march=corei7)

$ time perl ../shootout/bench/nbody/nbody.perl 50000

-0.169075164
-0.169057007

 real   0m1.305s
 user   0m1.300s
 sys    0m0.000s

Compiled:

$ perlcc --time -r -O -S -O1 --Wb=-fno-destruct,-Uwarnings,-UB,-UCarp,-DOscpSqlm,-v \
              ../shootout/bench/nbody/nbody.perl 50000

script/perlcc: c time: 0.171225
script/perlcc: cc time: 0.984996
-0.169075214
-0.169078108
script/perlcc: r time: 0.600024

So we get a 2x times faster run-time, with a little bit of different results and a lot of interesting command line options.

--time prints the B::CC time as 'c time', the gcc and ld time as 'cc time', and the run-time as 'r time'. 0.625202s vs. 1.305s in pure perl. Even gcc plus ld with -OS is faster than perl. And B::CC's optimizer is also real fast here in this simple example.

-r runs the compiled program with the rest of the perlcc arguments

-O compiles with B::CC, not with the non-optimizing B::C

-S keeps the C source, to be able to inspect the generated optimized C code.

-O1 adds some minor B::CC optimizations. -O2 is pretty unstable yet, and B::CC proper (-O0) already adds all B::C -O3 optimizations.

--Wb defines further B::CC options

-fno-destruct is a B::C option to skip optree destruction at the very end. It does thread, IO and object destruction in proper order, and of course does object destruction during run-time, but we do not care of memory leaks with normal executables. Process termination does it better and faster than perl. Even daemons are safe to be compiled with -fno-destruct, just not shared libraries.

-U defines packages to be unused. warnings, B, Carp are notorious compiler packages, which are innocently being pulled in, even if you do not use or call them.

B is used by the compiler itself, and since the B maintainer does a terrible job helping the B compiler modules, we have to manually force B get out of our way. warnings and Carp are also magically pulled in by some dependent core modules and cause a lot of startup and memory overhead. These 3 packages are easily skipped with simple programs or benchmarks, in the real world you have to live with multi-megabyte compiled programs. This reflects the reality of the memory perl uses during run-time.

E.g. without -U the numbers are:

cc pp_main
 cc pp_sub_offset_momentum
 cc pp_sub_energy
 cc pp_sub_advance
 Prescan 1 packages for unused subs in main::
 Saving unused subs in main::
 old unused: 1, new: 1
 no %SIG in BEGIN block
 save context:
 cc pp_sub_warnings__register_categories
 cc pp_sub_warnings___mkMask
 Total number of OPs processed: 193
 NULLOP count: 0
 bootstrapping DynaLoader added to xs_init
 no dl_init for B, not marked
 my_perl_destruct (-fcog)
script/perlcc: c time: 0.192175
script/perlcc: cc time: 1.3049
-0.169075214
-0.169078108
script/perlcc: r time: 0.642252

-DOscpSqlm are some debugging options, which add interesting information into the generated C code. B::CC adds debugging output as comments into the C code, to be able to inspect the optimizer result, B::C prints debugging output to STDOUT.

Let's have a look into the code.

cat ../shootout/bench/nbody/nbody.perl

# The Computer Language Shootout
# http://shootout.alioth.debian.org/
#
# contributed by Christoph Bauer
# converted into Perl by Márton Papp
# fixed and cleaned up by Danny Sauer
# optimized by Jesse Millikan

use constant PI            => 3.141592653589793;
use constant SOLAR_MASS    => (4 * PI * PI);
use constant DAYS_PER_YEAR => 365.24;

#  Globals for arrays... Oh well.
#  Almost every iteration is a range, so I keep the last index rather than a count.
my (@xs, @ys, @zs, @vxs, @vys, @vzs, @mass, $last);

sub advance($)
{
  my ($dt) = @_;
  my ($mm, $mm2, $j, $dx, $dy, $dz, $distance, $mag);

  #  This is faster in the outer loop...
  for (0..$last) {
  #  But not in the inner loop. Strange.
    for ($j = $_ + 1; $j < $last + 1; $j++) {
      $dx = $xs[$_] - $xs[$j];
      $dy = $ys[$_] - $ys[$j];
      $dz = $zs[$_] - $zs[$j];
      $distance = sqrt($dx * $dx + $dy * $dy + $dz * $dz);
      $mag = $dt / ($distance * $distance * $distance);
      $mm = $mass[$_] * $mag;
      $mm2 = $mass[$j] * $mag;
      $vxs[$_] -= $dx * $mm2;
      $vxs[$j] += $dx * $mm;
      $vys[$_] -= $dy * $mm2;
      $vys[$j] += $dy * $mm;
      $vzs[$_] -= $dz * $mm2;
      $vzs[$j] += $dz * $mm;
    }

    # We're done with planet $_ at this point
    # This could be done in a seperate loop, but it's slower
    $xs[$_] += $dt * $vxs[$_];
    $ys[$_] += $dt * $vys[$_];
    $zs[$_] += $dt * $vzs[$_];
  }
}

sub energy
{
  my ($e, $i, $dx, $dy, $dz, $distance);

  $e = 0.0;
  for $i (0..$last) {
    $e += 0.5 * $mass[$i] *
          ($vxs[$i] * $vxs[$i] + $vys[$i] * $vys[$i] + $vzs[$i] * $vzs[$i]);
    for ($i + 1..$last) {
      $dx = $xs[$i] - $xs[$_];
      $dy = $ys[$i] - $ys[$_];
      $dz = $zs[$i] - $zs[$_];
      $distance = sqrt($dx * $dx + $dy * $dy + $dz * $dz);
      $e -= ($mass[$i] * $mass[$_]) / $distance;
    }
  }
  return $e;
}

sub offset_momentum
{
  my ($px, $py, $pz) = (0.0, 0.0, 0.0);

  for (0..$last) {
    $px += $vxs[$_] * $mass[$_];
    $py += $vys[$_] * $mass[$_];
    $pz += $vzs[$_] * $mass[$_];
  }
  $vxs[0] = - $px / SOLAR_MASS;
  $vys[0] = - $py / SOLAR_MASS;
  $vzs[0] = - $pz / SOLAR_MASS;
}

# @ns = ( sun, jupiter, saturn, uranus, neptune )
@xs = (0, 4.84143144246472090e+00, 8.34336671824457987e+00, 1.28943695621391310e+01, 1.53796971148509165e+01);
@ys = (0, -1.16032004402742839e+00, 4.12479856412430479e+00, -1.51111514016986312e+01, -2.59193146099879641e+01);
@zs = (0, -1.03622044471123109e-01, -4.03523417114321381e-01, -2.23307578892655734e-01, 1.79258772950371181e-01);
@vxs = map {$_ * DAYS_PER_YEAR}
  (0, 1.66007664274403694e-03, -2.76742510726862411e-03, 2.96460137564761618e-03, 2.68067772490389322e-03);
@vys = map {$_ * DAYS_PER_YEAR}
  (0, 7.69901118419740425e-03, 4.99852801234917238e-03, 2.37847173959480950e-03, 1.62824170038242295e-03);
@vzs = map {$_ * DAYS_PER_YEAR}
  (0, -6.90460016972063023e-05, 2.30417297573763929e-05, -2.96589568540237556e-05, -9.51592254519715870e-05);
@mass = map {$_ * SOLAR_MASS}
  (1, 9.54791938424326609e-04, 2.85885980666130812e-04, 4.36624404335156298e-05, 5.15138902046611451e-05);

$last = @xs - 1;

offset_momentum();
printf ("%.9f\n", energy());

my $n = $ARGV[0];

# This does not, in fact, consume N*4 bytes of memory
for (1..$n){
  advance(0.01);
}

printf ("%.9f\n", energy());

A lot of arithmetic, only three functions, advance is called 50,000 times, the others only once.

The generated C code for some inlined arithmetic looks like:

$ grep -A50 pp_sub_energy nbody.perl.c

static
CCPP(pp_sub_energy)
{
    double rnv0, lnv0, d1_e, d2_i, d3_dx, d4_dy, d5_dz, d6_distance, d11_tmp, d13_tmp,
           d15_tmp, d16_tmp, d18_tmp, d19_tmp, d20_tmp, d22_tmp, d31_tmp, d32_tmp, d33_tmp,
           d34_tmp, d35_tmp, d37_tmp, d38_tmp;
    SV *sv, *src, *dst, *left, *right;
    PERL_CONTEXT *cx;
    MAGIC *mg;
    I32 oldsave, gimme;
    dSP;
    /* init_pp: pp_sub_energy */
    /* load_pad: 39 names, 39 values */
    /* PL_curpad[1] = Padsv type=T_UNKNOWN flags=VALID_SV sv=PL_curpad[1] iv=i1_e nv=d1_e */
    /* PL_curpad[2] = Padsv type=T_UNKNOWN flags=VALID_SV sv=PL_curpad[2] iv=i2_i nv=d2_i */
    /* PL_curpad[3] = Padsv type=T_UNKNOWN flags=VALID_SV sv=PL_curpad[3] iv=i3_dx nv=d3_dx */
    /* PL_curpad[4] = Padsv type=T_UNKNOWN flags=VALID_SV sv=PL_curpad[4] iv=i4_dy nv=d4_dy */
    /* PL_curpad[5] = Padsv type=T_UNKNOWN flags=VALID_SV sv=PL_curpad[5] iv=i5_dz nv=d5_dz */
    /* PL_curpad[6] = Padsv type=T_UNKNOWN flags=VALID_SV sv=PL_curpad[6] iv=i6_distance nv=d6_distance */
    /* PL_curpad[7] = Padsv type=T_UNKNOWN flags=VALID_SV sv=PL_curpad[7] iv=i7_last nv=d7_last */
    /* PL_curpad[8] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[8] iv=i8_tmp nv=d8_tmp */
    /* PL_curpad[9] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[9] iv=i9_tmp nv=d9_tmp */
    /* PL_curpad[10] = Padsv type=T_UNKNOWN flags=VALID_SV sv=PL_curpad[10] iv=i10_tmp nv=d10_tmp */
    /* PL_curpad[11] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[11] iv=i11_tmp nv=d11_tmp */
    /* PL_curpad[12] = Padsv type=T_UNKNOWN flags=VALID_SV sv=PL_curpad[12] iv=i12_tmp nv=d12_tmp */
    /* PL_curpad[13] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[13] iv=i13_tmp nv=d13_tmp */
    /* PL_curpad[14] = Padsv type=T_UNKNOWN flags=VALID_SV sv=PL_curpad[14] iv=i14_tmp nv=d14_tmp */
    /* PL_curpad[15] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[15] iv=i15_tmp nv=d15_tmp */
    /* PL_curpad[16] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[16] iv=i16_tmp nv=d16_tmp */
    /* PL_curpad[17] = Padsv type=T_UNKNOWN flags=VALID_SV sv=PL_curpad[17] iv=i17_tmp nv=d17_tmp */
    /* PL_curpad[18] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[18] iv=i18_tmp nv=d18_tmp */
    /* PL_curpad[19] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[19] iv=i19_tmp nv=d19_tmp */
    /* PL_curpad[20] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[20] iv=i20_tmp nv=d20_tmp */
    /* PL_curpad[21] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[21] iv=i21_tmp nv=d21_tmp */
    /* PL_curpad[22] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[22] iv=i22_tmp nv=d22_tmp */
    /* PL_curpad[23] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[23] iv=i23_tmp nv=d23_tmp */
    /* PL_curpad[24] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[24] iv=i24_tmp nv=d24_tmp */
    /* PL_curpad[25] = Padsv type=T_UNKNOWN flags=VALID_SV sv=PL_curpad[25] iv=i25_tmp nv=d25_tmp */
    /* PL_curpad[26] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[26] iv=i26_tmp nv=d26_tmp */
    /* PL_curpad[27] = Padsv type=T_UNKNOWN flags=VALID_SV sv=PL_curpad[27] iv=i27_tmp nv=d27_tmp */
    /* PL_curpad[28] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[28] iv=i28_tmp nv=d28_tmp */
    /* PL_curpad[29] = Padsv type=T_UNKNOWN flags=VALID_SV sv=PL_curpad[29] iv=i29_tmp nv=d29_tmp */
    /* PL_curpad[30] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[30] iv=i30_tmp nv=d30_tmp */
    /* PL_curpad[31] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[31] iv=i31_tmp nv=d31_tmp */
    /* PL_curpad[32] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[32] iv=i32_tmp nv=d32_tmp */
    /* PL_curpad[33] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[33] iv=i33_tmp nv=d33_tmp */
    /* PL_curpad[34] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[34] iv=i34_tmp nv=d34_tmp */
    /* PL_curpad[35] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[35] iv=i35_tmp nv=d35_tmp */
    /* PL_curpad[36] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[36] iv=i36_tmp nv=d36_tmp */
    /* PL_curpad[37] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[37] iv=i37_tmp nv=d37_tmp */
    /* PL_curpad[38] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[38] iv=i38_tmp nv=d38_tmp */
    /* PL_curpad[39] = Padsv type=T_UNKNOWN flags=VALID_SV|REGISTER|TEMPORARY sv=PL_curpad[39] iv=i39_tmp nv=d39_tmp */
  lab_1fd4ba0:  /* nextstate */
    /* stack =  */
    /* COP (0x1fd4ba0) nextstate [0] */
    /* ../shootout/bench/nbody/nbody.perl:51 */
    TAINT_NOT;
    sp = PL_stack_base + cxstack[cxstack_ix].blk_oldsp;
    FREETMPS;
    /* write_back_stack() 0 called from B::CC::compile_bblock */
  lab_1fd4a10:  /* pushmark */
    /* stack =  */
    /* OP (0x1fd4a10) pushmark [0] */
    /* write_back_stack() 0 called from B::CC::pp_pushmark */
    PUSHMARK(sp);
    /* stack =  */
    /* OP (0x1fd4960) padsv [1] */
    SAVECLEARSV(PL_curpad[1]);
    /* stack = PL_curpad[1] */
    /* OP (0x1fd49c0) padsv [2] */
    SAVECLEARSV(PL_curpad[2]);
    /* stack = PL_curpad[1] PL_curpad[2] */
    /* OP (0x1fd4a40) padsv [3] */
    SAVECLEARSV(PL_curpad[3]);
    /* stack = PL_curpad[1] PL_curpad[2] PL_curpad[3] */
    /* OP (0x1fd4a90) padsv [4] */
    SAVECLEARSV(PL_curpad[4]);
    /* stack = PL_curpad[1] PL_curpad[2] PL_curpad[3] PL_curpad[4] */
    /* OP (0x1fd4990) padsv [5] */
    SAVECLEARSV(PL_curpad[5]);
    /* stack = PL_curpad[1] PL_curpad[2] PL_curpad[3] PL_curpad[4] PL_curpad[5] */
    /* OP (0x1fd4930) padsv [6] */
    SAVECLEARSV(PL_curpad[6]);
    /* stack = PL_curpad[1] PL_curpad[2] PL_curpad[3] PL_curpad[4] PL_curpad[5] PL_curpad[6] */
    /* LISTOP (0x1e99820) list [0] */
    /* list */
    /* write_back_stack() 6 called from B::CC::pp_list */
    EXTEND(sp, 6);
    PUSHs((SV*)PL_curpad[1]);
    PUSHs((SV*)PL_curpad[2]);
    PUSHs((SV*)PL_curpad[3]);
    PUSHs((SV*)PL_curpad[4]);
    PUSHs((SV*)PL_curpad[5]);
    PUSHs((SV*)PL_curpad[6]);
    PP_LIST(1);
    /* write_back_stack() 0 called from B::CC::compile_bblock */
    ...

No interesting code, but you get the idea that the compiler keeps track of all the used lexicals and stack variables and was able to optimize some types of most of the numeric lexicals.

sub energy
{
  my ($e, $i, $dx, $dy, $dz, $distance);
...

E.g. PL_curpad[1] = Padsv type=T_UNKNOWN flags=VALID_SV sv=PL_curpad[1] iv=i1_e nv=d1_e

PL_curpad[1] the first lexical, which is named i1_e for the IV value and $e in the perl code.

type=T_UNKNOWN means that there was no strict type information inferred. T_DOUBLE would have been better as $e is only used as NV and returns the resulting energy. A declaration of my double $e; would have done that.

flags=VALID_SV is also not optimal, |REGISTER|TEMPORARY would be better. iv=i1_e nv=d1_e are the two theoretical dual vars during the life-time in this local function. But only the NV d1_e is used. The IV part i1_e is never used and not declared.

Let's continue to some interesting parts:

lab_1fffd30:    /* nextstate */
/* ../shootout/bench/nbody/nbody.perl:61 */
TAINT_NOT;
sp = PL_stack_base + cxstack[cxstack_ix].blk_oldsp;
FREETMPS;
/* stack =  */
/* OP (0x1fd5260) padsv [3] */
/* stack = PL_curpad[3] */
/* OP (0x1fd5290) padsv [3] */
/* stack = PL_curpad[3] PL_curpad[3] */
/* BINOP (0x1fd51c0) multiply [31] */
d3_dx = SvNV(PL_curpad[3]);
rnv0 = d3_dx; lnv0 = d3_dx; /* multiply */
d31_tmp = lnv0 * rnv0;
/* stack = d31_tmp */
/* OP (0x1fffaf0) padsv [4] */
/* stack = d31_tmp PL_curpad[4] */
/* OP (0x1fffb20) padsv [4] */
/* stack = d31_tmp PL_curpad[4] PL_curpad[4] */
/* BINOP (0x1fffb50) multiply [32] */
d4_dy = SvNV(PL_curpad[4]);
rnv0 = d4_dy; lnv0 = d4_dy; /* multiply */
d32_tmp = lnv0 * rnv0;
/* stack = d31_tmp d32_tmp */
/* BINOP (0x1fffb90) add [33] */
rnv0 = d32_tmp; lnv0 = d31_tmp; /* add */
d33_tmp = lnv0 + rnv0;
/* stack = d33_tmp */
/* OP (0x1fffbd0) padsv [5] */
/* stack = d33_tmp d5_dz */
/* OP (0x1fffc00) padsv [5] */
/* stack = d33_tmp d5_dz d5_dz */
/* BINOP (0x1fffc30) multiply [34] */
rnv0 = d5_dz; lnv0 = d5_dz; /* multiply */
d34_tmp = lnv0 * rnv0;
/* stack = d33_tmp d34_tmp */
/* BINOP (0x1fffc70) add [35] */
rnv0 = d34_tmp; lnv0 = d33_tmp; /* add */
d35_tmp = lnv0 + rnv0;
/* stack = d35_tmp */
/* UNOP (0x1fffcb0) sqrt [6] */
/* write_back_lexicals(0) called from B::CC::default_pp */
sv_setnv(PL_curpad[5], d5_dz);
sv_setnv(PL_curpad[31], d31_tmp);
sv_setnv(PL_curpad[32], d32_tmp);
sv_setnv(PL_curpad[33], d33_tmp);
sv_setnv(PL_curpad[34], d34_tmp);
sv_setnv(PL_curpad[35], d35_tmp);
/* write_back_stack() 1 called from B::CC::default_pp */
EXTEND(sp, 1);
PUSHs((SV*)PL_curpad[35]);
PL_op = (OP*)&unop_list[31];
DOOP(PL_ppaddr[OP_SQRT]);
/* invalidate_lexicals(0) called from B::CC::default_pp */
/* stack =  */

This is part of:

  $distance = sqrt($dx * $dx + $dy * $dy + $dz * $dz);

We see the OP_SQRT as last part, not inlined, and all the simple + and * being unboxed and inlined via tempory variables. What I called stack smashing is write_back_lexicals writing back the nv values of PL_curpad[5] and PL_curpad[31-35], and write_back_stack() PL_curpad[35] as argument for SQRT.

My idea was to calculate directly on the SvNVX(PL_curpad[*]) values, but on second thought I believe copying the values to temporaries, basically in local stack locations or even in registers is faster than doing ptr references to them. Initialising and writing them back seems to be okay and not exaggerated.

So arithmetic optimizations are already pretty good, sqrt could be inlined also, since perl has no bignum promotion, so the big remaining problems are consting, function calls, method calls and stabilize B::CC.

To compare real numbers, 50_000_000 is the argument used at alioth, leading to 26m. My PC is a bit faster, needing 22m13s.

$ time perl5.14.2-nt ../shootout/bench/nbody/nbody.perl 50000
-0.169075164
-0.169096567

real    **0m13.132s**
user    0m13.109s
sys         0m0.000s

Compiled:

perlcc --time -r -O -S -O1 --Wb=-fno-destruct,-Uwarnings,-UB,-UCarp \
       ../shootout/bench/nbody/nbody.perl 50000

script/perlcc: c time: 0.158228
script/perlcc: cc time: 0.98483
-0.169075214
-0.169096616
script/perlcc: r time: **5.992293**

Comparable times with N=50,000,000:

perlcc --time -r -O -S -O1 --Wb=-fno-destruct,-Uwarnings,-UB,-UCarp \
       ../shootout/bench/nbody/nbody.perl 50000000

script/perlcc: c time: 0.19311
script/perlcc: cc time: 0.962425
-0.169075214
-0.169096616
script/perlcc: r time: **591.965999**

Reference value, uncompiled:

time perl5.14.2-nt ../shootout/bench/nbody/nbody.perl 50000000
-0.169075164
-0.169059907

real        22m13.754s
user        22m8.155s
sys         0m1.156s

591.965999s = 9m51.966s vs 22m13.754s

So we bypassed python 3 (18m), php (12min) and ruby 1.9 (23m), but not jruby (9m). jruby uses 585,948KB memory though, and at least php has a better algo.

Function calls and more optimisations are inspected in part 3, hopefully with the binarytrees benchmark. I will also analyse the calls to the sub analyse loop here, as sub analyse can be easily optimized automatically. This function does not throw exceptions, has a prototype defining one argument, has no return value and ignores return values, and does not define any locals. It even can be automatically inlined.

for (1..$n) {
    advance(0.01);
}

The uncompiled, inlined version for sub analyse needs 21m48.015s, 25s less. Compiled and inlined manually: 612.395542s (10m12s), a bit slower than not inlined. So the biggest performance hit is the unoptimized slow AELEM op, which accesses array elements. With an immediate AELEM the run-time should be 8-10x faster, such as the AELEMFAST op, which is already inlined. I'm going for LVAL optimizations in AELEM. Typed arrays would also help a lot here.

Increasing precision

The casual reader might have noticed that the compiler result would not pass the shootout precision tests as it produced an invalid result.

Wanted: +-1e-8 with arg 1000

-0.169075164
-0.169087605

Have: with arg 1000

-0.169075214
-0.169087656

That's not even close, it's a 6 digit precision. The perl afficionado might remember the Config settings nvgformat to print out NV, and d_longdbl to define if long double is defined.

long double is however not used as NV, and worst %g is used as printf format for NV, not %.16g as it should be done to keep double precision. %g is just a pretty result to the casual reader, but not suitable to keep precision, e.g. for Data::Dumper, YAML, JSON, or the B::C or perlito compilers.

So I changed the NV format to %.16g with commit 3fc61aa and the precision went up, passing the nbody shootout test with argument 1000.

New result with arg 1000

-0.169075164
-0.169087605

Exactly the same. Also for other n cmdline arguments.

See part 3 which finds more optimizations, being 2x times faster on top of this.

Yet Another Perl Podcast #2: Новости за прошедший месяц, источники информации на русском, новые cpan модули

События за прошедший месяц ( март - 1.03.11-1.04.11 ):

  • 13-15 мая 2011 - May Perl + Perl Mova в Москве

  • Открыта регистрация на воркшоп Perlburg 2011

  • Алексей Капранов…
  • 2  

    About blogs.perl.org

    blogs.perl.org is a common blogging platform for the Perl community. Written in Perl and offering the modern features you’ve come to expect in blog platforms, the site is hosted by Dave Cross and Aaron Crane, with a design donated by Six Apart, Ltd.