Roles without Moose?

I'm on a new team at the BBC. On the previous team, PIPs, we gathered BBC programme data for television and radio. The rest of the BBC could use PIPs to pull schedules, get information about Doctor Who (note, that's "Doctor", not "Dr."!) or understand how a radio programme is broken down into segments which might be rebroadcast on a different programme. The work was complex, but fun. If our system went down, large parts of the BBC wouldn't be able to update their programme data.

On the new team, Dynamite, it's a different story. If we go down, large parts of the BBC's online presence go down. Ever visit www.bbc.co.uk/iplayer/? That's ours. Given that the BBC is one of the most heavily trafficked web sites in the world, we have to worry about performance. We count milliseconds. As a result, the team I'm on now doesn't use Moose. Ah, but you tell me Moose is fast now! Yes, Moose is fast and most of its performance issues are in the startup, not in the runtime. I'll agree with you on this, but look at this benchmark:

#!/usr/bin/env perl

{
    package Foo::Moose;
    use Moose;
    has bar => (is => 'rw');
    __PACKAGE__->meta->make_immutable;
}
{
    package Foo::Manual;
    sub new { bless {} => shift }
    sub bar {
        my $self = shift;
        return $self->{bar} unless @_;
        $self->{bar} = shift;
    }
}
my $foo1 = Foo::Moose->new;
sub moose {
    $foo1->bar(32);
    my $x = $foo1->bar;
}
my $foo = Foo::Manual->new;
sub manual {
    $foo->bar(32);
    my $x = $foo->bar;
}
use Benchmark 'timethese';

print "Testing Perl $]\n";
timethese(
    1_500_000,
    {
        moose  => \&moose,
        manual => \&manual,
    }
);

Sample output:

Testing Perl 5.010001
Benchmark: timing 1500000 iterations of manual, moose...
    manual:  2 wallclock secs ( 1.86 usr +  0.00 sys =  1.86 CPU) @ 806451.61/s (n=1500000)
     moose:  1 wallclock secs ( 1.93 usr +  0.00 sys =  1.93 CPU) @ 777202.07/s (n=1500000)

No matter how many times I run this, we see the manual output only a hair faster than Moose. Of course, we had to avoid constructing the object in this benchmark. Otherwise, we see that object construction in Moose is slow:

Benchmark: timing 1500000 iterations of manual, moose...
    manual:  5 wallclock secs ( 4.43 usr +  0.01 sys =  4.44 CPU) @ 337837.84/s (n=1500000)
     moose:  6 wallclock secs ( 7.40 usr +  0.00 sys =  7.40 CPU) @ 202702.70/s (n=1500000)

(Look at the @$num/s figures).

That's not fair, though, because you construct an object once and then do lots of things with it. That being said, Moose offers so many benefits that our tiny, tiny performance hit is worth it, isn't it? Look at the original code and you'll see that we're not really taking advantage of Moose, so let's add a type check.

{
    package Foo::Moose;
    use Moose;
    has bar => (is => 'rw', isa => 'Int');
    __PACKAGE__->meta->make_immutable;
}

And the benchmark:

Benchmark: timing 1500000 iterations of manual, moose...
    manual:  1 wallclock secs ( 1.88 usr +  0.00 sys =  1.88 CPU) @ 797872.34/s (n=1500000)
     moose:  6 wallclock secs ( 5.14 usr +  0.00 sys =  5.14 CPU) @ 291828.79/s (n=1500000)

Oops. If we actually try to take advantage of the features of Moose, we still take a serious performance hit. For most people will this won't matter. Ah, but you argue that I should have that type checking and you're right, but in reality, much Perl code deep in a system doesn't have type checking. But let's add a quick check of our own, just to be more fair.

    sub bar {
        my $self = shift;
        return $self->{bar} unless @_;
        croak "Need int, not ($_[0])" unless 0+$_[0] =~ /^\d+$/;
        $self->{bar} = shift;
    }

That's not a great check, but it's better than many people provide. Here's the benchmark:

Benchmark: timing 1500000 iterations of manual, moose...
    manual:  2 wallclock secs ( 3.35 usr +  0.00 sys =  3.35 CPU) @ 447761.19/s (n=1500000)
     moose:  4 wallclock secs ( 5.20 usr +  0.00 sys =  5.20 CPU) @ 288461.54/s (n=1500000)

Again, with carefully crafted code, we can outperform Moose, but we still don't get Moose's flexibility. This will matter to very few people and unless you have a very clear reason, don't skip Moose just for this. Regrettably, our millisecond response times mean that we have a problem.

That problem, in this case, is multiple inheritance. As with many codebases that evolve over time, lots of programmers have had a chance to "improve" the system and I'm seeing a lot of multiple inheritance. I'm seeing classes which have five parents! Running Class::Sniff over them is showing quite a few issues and it's clear from even a cursory examination that this MI is for sharing behavior.

Sharing behavior is exactly what roles are for. So if we're concerned about the overhead of Moose, what options do we have? I've deprecated Class::Trait. Is it time to resurrect (and benchmark) it? Mouse seemed promising, but initial benchmarks with the above code showed it's getters/setters running slightly slower than Moose getter/setters! We can take a performance hit on load, but on runtime, we have to be careful.

Maybe we need a very lightweight:

use role 
  'Does::Seriliazation',
  'Does::TitleSearch',
  'Does::IdMatching' => { excludes => 'some_method' };

No runtime application would be allowed. There would be no introspection beyond DOES. Multiple "use role" in the same class would fail (this solves a few problems I won't go into now). No Moose, Mouse or anything else would be required. Better suggestions are welcome. I'll guess that I could use Moose without accessors and without inlining constructors and take advantage of roles that way. Sounds better, but more benchmarking is needed.

10 Comments

Although we have finally begun using Moose everywhere, for a long time we avoided it in some of our most performance-sensitive code. For roles (which we used heavily) we used Sub::Exporter. It allowed us to share methods in many classes without creating isa relationships.

It also sometimes afforded much more complex behavior, cheaper, than Moose yet does. For example, see Mixin::ExtraFields ( http://advent.rjbs.manxome.org/2009-12-22.html ). We had a number of similar libraries that behaved something like parameterized roles, sans Moose.

A call went out several months ago for profiling data from real world applications so we could see where the performance problems were. I don't recall that anybody submitted anything.

While we could optimize based on the test suite, and the test suites of the various MooseX modules, having real world data is the most useful for everyone. Would it be possible to translate some of your code over to something that can be profiled to show specific slowdowns in Moose?

I'm the first (well second, rjbs beat me, and I'm sure stevan would be in line in front of me if possible ... so third) to admit that there are use-cases where Moose is (still) too slow to use for some production environments. I also know that I have access to a production system that handles roughly the same level of traffic as bbc.co.uk and we haven't had issues (that I'm aware of) handling the traffic that I can pinpoint on Moose.

I'm unsure I trust your benchmark at all.

On my system, the loop took 2 seconds or so to run, and showed a lot of variability..

I had a bit of a play, results: git clone http://goatse.co.uk/~bobtfish/Moose-Accessor-Performance.git

Even after increasing the number of iterations by an order of magnitude, in 5 subsequent runs - sometimes the Moose code was faster, sometimes the handwritten code.

So I think this is very much going to depend on your system and your perl..

Have not used it, but can Mouse::XS be a useful choice here?

Interesting post.

Wasn't Mouse deprecated?

Of course, even if Moose is fast now, that doesn't help much with a large complex code base with large complex dependencies and which is itself depended on by other large applications, and that has to have new features added all the time and which was started when Moose was slow. Which is why Mouse::XS is unlikely to help either.

FWIW Mouse now blows the doors off the rest of them. It's likely a result of Mouse::XS. Here's my results:

Testing Perl 5.012002, Moose 1.21, 0.88
Benchmark: timing 1500000 iterations of manual, moose, mouse...
    manual:  3 wallclock secs ( 2.23 usr +  0.00 sys =  2.23 CPU) @ 672645.74/s (n=1500000)
     moose:  3 wallclock secs ( 2.33 usr +  0.01 sys =  2.34 CPU) @ 641025.64/s (n=1500000)
     mouse:  1 wallclock secs ( 0.79 usr +  0.00 sys =  0.79 CPU) @ 1898734.18/s (n=1500000)

I did some further benchmarking. Here it is against a raw hash.

Testing Perl 5.012002, Moose 1.21, Mouse 0.88
Benchmark: timing 6000000 iterations of hash, manual, moose, mouse...
      hash:  1 wallclock secs ( 2.03 usr +  0.01 sys =  2.04 CPU) @
2941176.47/s (n=6000000)
    manual:  8 wallclock secs ( 8.87 usr +  0.02 sys =  8.89 CPU) @
674915.64/s (n=6000000)
     moose:  8 wallclock secs ( 9.04 usr +  0.01 sys =  9.05 CPU) @
662983.43/s (n=6000000)
     mouse:  2 wallclock secs ( 3.22 usr +  0.01 sys =  3.23 CPU) @
1857585.14/s (n=6000000)

Note that Mouse is less than half as slow as direct hash access. It gets better...

Here it is with an (isa => "Int") type check on the Moose/Mouse accessors. I didn't saddle the hand written ones with their own checks, so it's not entirely fair, but...

Testing Perl 5.012002, Moose 1.21, Mouse 0.88
Benchmark: timing 6000000 iterations of hash, manual, moose, mouse...
      hash:  3 wallclock secs ( 2.15 usr +  0.01 sys =  2.16 CPU) @
2777777.78/s (n=6000000)
    manual: 10 wallclock secs ( 8.83 usr +  0.01 sys =  8.84 CPU) @
678733.03/s (n=6000000)
     moose: 22 wallclock secs (20.87 usr +  0.02 sys = 20.89 CPU) @
287218.76/s (n=6000000)
     mouse:  4 wallclock secs ( 3.13 usr +  0.01 sys =  3.14 CPU) @
1910828.03/s (n=6000000)

With a type check, Moose slows way down. Mouse remains unaffected. You get better performance than hand rolling, and you get type checking.

Now, I don't know how deep the optimizations go, but I expect them to continue as Mouse proceeds. Why? Moose is really about Class::MOP which is about building an object system. It's about flexibility and good internal design.

Mouse is an implementation of Moose. It's about the implementation. Mouse can do optimizations because it doesn't have to worry about MouseX modules extending it.

Both noble goals. I'm glad I have the choice between performance and extensibility. And I'm glad I can make that on a per class basis.

A comparison with Class::XSAccessor would be interesting; I think that's the fastest known accessor method that actually is a method.

Hey, Schwern! Where are you keeping your benchmark code? All benchmark results should cite code.

Leave a comment

About Ovid

user-pic Have Perl; Will Travel. Freelance Perl/Testing/Agile consultant. Photo by http://www.circle23.com/. Warning: that site is not safe for work. The photographer is a good friend of mine, though, and it's appropriate to credit his work.