An interesting memory hog

So last day I with a colleague got to trace an interesting memory leak ( which was rather a memory waste than a leak ). It was using tens of gigabytes of RAM, whereas I wouldn't expect it to use more than 3-4GB.

Call it a witchcraft if you like, but we identified line to blame within first minutes we started looking at the problem. Unfortunately, we were not able to convince each other that it is the issue and as the problem was only visible in a long running soak test we were not able to justify running it.

Perl's garbage collection works by reference counting and only frees circular references at exit. As we were dealing with a long running daemon - we started by trying to locate circular references. Inspecting code gave nothing away. So we decided to utilise wonderful Paul Evans' Devel::MAT module. Unfortunately we were not able to locate any circular references.

Finally we decided to look at how Perl sees our code by utilising B::Deparse & opcodes via B::Concise. Somewhere deep in ourselves we started doubting perhaps it is some Coro magic, however, as you will see later - completely unnecessarily. It was not Coro and frankly Coro has lots of potential and I believe it should be in Perl's core to assure its future ( though I know Marc Lehmann is of different opinion ).

Back to the story. By the end of the day we found nothing, we tried out nothing. I hinted to try commenting the line we identified within the very first couple minutes we started looking at the problem, which we did and left soak test running over night. In the morning, to much of our surprise, we found it was no longer wasting memory. For interested souls, it was line similar to following: logdebug( 'Returned ' . join(", ", map { "$" } @list ));

( yes ironically it was unnecessary call as log_debug() does nothing in normal mode we were running ). Nonetheless, consider following dummy cut-down script:

#!/usr/bin/perl
use 5.14.0;
use warnings;

use Data::Dumper;
use Devel::MAT::Dumper;

sub log_it
{
    my ($line) = @_;
    return length $line;
}

sub do_thing
{
    my ($n) = @_;
    my @list = ("x") x $n;

    # large memory waste:
    my $logline = "Returned " . join(",", map { "$_/" } @list);
    my $length = log_it($logline);

    return $length;
}

my @results = map { my $ln = $_ * 10000; do_thing($ln); } (1..10);

say Dumper @results;

Devel::MAT::Dumper::dump("/tmp/map-leaker.pmat");

which shows a memory hog in following Devel::MAT interactive explorer ( sorted by size ):

$ pmat-explore-gtk /tmp/map-leaker.pmat

Selection_054.png View image

Interestingly following does not end up wasting memory:

#!/usr/bin/perl
use 5.14.0;
use warnings;

use Data::Dumper;
use Devel::MAT::Dumper;

sub log_it
{
    my ($line) = @_;
    return length $line;
}

sub do_thing
{
    my ($n) = @_;
    my @list = ("x") x $n;

    # small:
    my $logline = join(",", map { "$_/" } @list);
    my $length = log_it("Returned " . $logline);

    # large:
    #my $logline = "Returned " . join(",", map { "$_/" } @list);
    #my $length = log_it($logline);

    return $length;
}

my @results = map { my $ln = $_ * 10000; do_thing($ln); } (1..10);

say Dumper @results;

Devel::MAT::Dumper::dump("/tmp/map-leaker.pmat");

Selection_055.png View image

nor does following:

    # also small:
    my $logline = "Returned " . (my $temp = join(",", map { "$_/" } @list));
    my $length = log_it($logline);

As one might expect, Coro does the right thing by copying padlists around, hence increases the memory waste - roughly - by number of active coroutines. Consider following example:

#!/usr/bin/perl
use 5.14.0;
use warnings;

use Coro;
use EV;
use Coro::AnyEvent;
use Data::Dumper;
use Devel::MAT::Dumper;

sub log_it
{
    my ($line) = @_;
    cede;
    return length $line;
}

sub do_thing
{
    my ($n) = @_;
    my @list = ("x") x $n;
    # small:
    #my $logline = join(",", map { "$_/" } @list);
    #my $length = log_it("Returned " . $logline);

    # also small:
    #my $logline = "Returned " . (my $temp = join(",", map { "$_/" } @list));
    #my $length = log_it($logline);

    # large:
    my $logline = "Returned " . join(",", map { "$_/" } @list);
    my $length = log_it($logline);

    return $length;
}

# Either construction shows the problem, but the coro one leaks 10 instances.
my @coros = map { my $ln = $_ * 10000; async { do_thing($ln); }; } (1..10);
my @results = map { $_->join(); } @coros;
# my @results = map { my $ln = $_ * 10000; do_thing($ln); } (1..10);

say Dumper @results;

# Demonstrate that there's one leak per active coro, not one per coro that
# ever existed. So there will still be 10 leaked even though we do another
# 10 iterations here.
my @morecoros = map { my $ln = $_ * 10000; async { do_thing($ln); }; } (11..20);
my @moreresults = map { $_->join(); } @morecoros;
say Dumper @moreresults;

Devel::MAT::Dumper::dump("/tmp/map-leaker.pmat");

and following outcome:

Selection_056.png View image

It's midnight here and I feel rather tired ( as you might tell by how quickly I glossed over last part ), so I will finish it here.

RFC Perl for education

For some time now, I have an idea of ePerl in my head. A subset of Perl or Perl in a sandbox. You might guess why? well Perl is great language, backwards compatibility makes your old code still run, even if some of the ancient designs are considered wrong nowadays. It’s all fine, except it’s not suitable for education.

In my opinion, to make Perl more acceptable in School/University curriculum we need to sell it to lazy teachers/lecturers, who need something like:

  • Current best practises upfront
  • No backwards compatibility
  • Strict, warnings, utf8 and newest Perl features on by default
  • Sub signatures and postfix dereferencing should be on and without experimental warnings
  • Most of the greatest CPAN modules should come preinstalled, and I am really talking about modules that helps beginners! i.e. Devel::REPL, Devel::DidYouMean, Moo, and many many other like Mojolicious, Dancer, Catalyst, whatever…
  • Forbid/remove special cases like split emulating awk.. or indirect object notation and many other silly leftovers
  • etc. etc.

Other languages break backward compatibility, they make current developers angry, but future generation don’t need to care what or why happened 10 years ago. Don’t get me wrong, backwards compatibility is superb, but it’s biggest Perl’s weakness today. Beginners don’t care about the core nor how to achieve thing X in Y different ways. Furthermore, this would allow them to learn quicker and safer Modern Perl.

What do you think?

=== P.S.

...While I am writing about Perl5, I believe Perl6 will have exact same problem...

...and yes.. I am aware of breaking CPAN. Though if it was running in a sandbox, it would still be able to escape and use CPAN modules..

… I believe Perl is unique language, that needs it, because:

  • Modern Perl’s best practises are rapidly changing, but core is very very behind ( due to backwards compatibility & that feature must first appear on CPAN )
  • Perl’s “There is more than one way to do it”....

I hate unpacking sub calls with shift

Perl community has moved away from using special predefined Perl variables such as $(, $), $:, $!, $^H, $/ or many others without explicitly commenting their purpose. But why are we still using shift for sub params? i.e.:

sub foo {
    my $bar = shift;
}

Why is it still fine within the community to skip the @_ ? If we promote shift, then lets use pop as well? Why not? it looks nice:

sub foo {
    return pop, shift;
}

Though I am sure someone already uses it.. how about those that use shift at line 100 inside the sub ? I hate that.. It makes really hard to follow code, for instance is it sixth or seventh unpacked argument?.. I think it’s bad practise.

I like when code is consistent and self-documenting. I love when the very first line inside the sub lists expected parameters! Just look how beautiful and tidy it looks:

sub foo {
    my ( $foo, $bar, $baz ) = @_;
}

You might think that it is convenient to use shift in cases like:

# EXAMPLE1
sub init {
    shift;
    my %args = @_;
}
sub foo {
    my $bar = shift // ‘default’;
}

# EXAMPLE2
sub foo { shift->call() }

# EXAMPLE3
sub extends {
    my $meta = shift;
    if ( @_ ) {
        print “foo bar baz”;
    }
    return @_;
}

# EXAMPLE4
sub foo {
    new $_[0], shift;
}
sub before {
    Foo::Bar::baz(shift, ‘before’, \@_);
}

But it’s horrible for newcomers! You are hurting them! What did they do to you?

  • See EXAMPLE1 and imagine you didn't understand ‘shift’. Next you google ‘Perl shift’ and probably find irrelevant information. If you add @_ i.e. 'shift( @_ )', you might have saved someone an hour.
  • how about EXAMPLE2 ? No semicolon.. no return.. and you can find this in many, many modules out there.
  • How about EXAMPLE3 ? Removing first item from @_ and then actually reusing @_ twice.
  • EXAMPLE4… aghhhh.. mixing two together..

Yet, shift might look tidier in compare to $_[0] when you are after performance and don’t want to assign named variables. But in 99% it doesn’t matter and if you need performance - document the need.

Lets start preparing for new wonderful Peter Martini’s sub signatures and tidy up.

About vytas

user-pic I am proud Perl developer since 2014. twitter: https://twitter.com/vytasdauksa