C::Blocks Advent Day 5

This is the C::Blocks Advent Calendar, in which I release a new treat each day about the C::Blocks library. Yesterday I showed how to declare and share C functions, which I think is C::Blocks’ killer feature. Today I will show some simple benchmarks that illustrate the performance of C::Blocks-compiled code.

Sorry that this Fifth Advent entry is being posted on the Seventh day of December! I got some really weird benchmark results and it took a while to really dig through them and figure out what was going on. I’ll have to work double-time for the next few days to get caught back up!

Let’s begin by measuring how long it takes to load C::Blocks. As a point of comparison, a script that does nothing except load strict and warnings would be this:

use strict;
use warnings;

On my machine the bash built-in time usually reports a duration of 5-10ms:

$ time perl test.pl
real    0m0.009s
user    0m0.008s
sys     0m0.000s

If we simply include C::Blocks, the time jumps up a little:

# test.pl
use strict;
use warnings;
use C::Blocks;

$ time perl test.pl

real    0m0.043s
user    0m0.040s
sys     0m0.000s

Simply loading C::Blocks costs about 30ms. Finally, if we load PerlAPI, Types, and include a cblock that modifies a variable, we get some idea for the minimal cost of really using C::Blocks. Here’s the test script:

use strict;
use warnings;
use C::Blocks;
use C::Blocks::PerlAPI;
use C::Blocks::Types qw(uint);

my uint $foo = 0;
cblock {
    $foo = 5;
}

Notice it doesn’t print anything because I don’t want the I/O system to interfere with the timing. Running this I get a running time of no more than 60ms:

$ time perl test.pl

real    0m0.060s
user    0m0.048s
sys     0m0.008s

Thus, the minimal additional cost of doing something useful with C::Blocks (on my machine) should be at least 50ms, five times what it takes to merely load the Perl interpreter.

With these as our baseline, let’s do some more useful benchmarks. Perl is supposed to have perfomant I/O and string manipulation. What is the tradeoff between pure-Perl and C for munging the contents of a text file? We expect the C code to be lengthier, but will it lead to any appreciable performance increase? To find out, I generated text files with lines of random text, some of which included Fred. The program should create a new file with the same contents except Fred is replaced by Barney. Here is the Perl program:

use strict;
use warnings;
open my $out_fh, '>', 'perlout.txt';
while (<>) {
    s/Fred/Barney/g;
    print $out_fh $_;
}

Here is the C::Blocks program:

use strict;
use warnings;
use C::Blocks;
use C::Blocks::Types qw(char_array);
my char_array $filename = $ARGV[0];

cblock {
    FILE * in_fh = fopen($filename, "r");
    FILE * out_fh = fopen("cout.txt", "w");
    char * original = "Fre";

    int match_length = 0;
    int curr_char = fgetc(in_fh);
    while (curr_char != EOF) {
        if (curr_char == original[match_length]) {
            /* found character in sequence */
            match_length++;
        }
        else if (match_length == 3 && curr_char == 'd') {
            /* found full name! print and reset */
            fprintf(out_fh, "Barney");
            match_length = 0;
        }
        else {
            /* incomplete match, print what we've skipped */
            if (match_length) fprintf(out_fh, "%.*s", match_length, original);

            /* just in case we have FFred or FreFred */
            if (curr_char == 'F') match_length = 1;
            else {
                match_length = 0;
                fputc(curr_char, out_fh);
            }
        }

        curr_char = fgetc(in_fh);
    }

    fclose(in_fh);
    fclose(out_fh);
}

Here’s a breakdown of the difference between these two programs. For the Perl script, we have:

  • Lines of code: 7
  • Development time: about 10 seconds
  • Cleverness: none, this is a boringly obvious script
  • Execution time: 80ms to 130ms

As for the C script, we have:

  • Lines of code: 41
  • Development time: 45 minutes, including one failed start and two rounds of testing
  • Cleverness: mildly happy about tracking state with original
  • Execution time: 130ms to 160ms

First the caveats. The execution times are obviously specific to my machine, but the relationship between them is likely to remain the same on other platforms. If this code were compiled using gcc, it would likely run faster because the Tiny C Compiler underlying C::Blocks does not produce lightning fast C code. (We’ll get to that in a second.) Also, I am pretty sure that a more succinct C version could be written, and there’s probably another C implementation of this process that would run faster. Finally, I didn’t really hit noticeable performance issues until I processed 100,000 lines of text. Anything shorter than that was hardly any different from the startup times.

All of those caveats having been said, I doubt that the C::Blocks version can be made to run significantly faster. As such, we are left with the basic conclusion that the Perl version is easier to write, easier to understand, fewer lines of code, and comparable speed. If you are trying to write code that performs lots of I/O and string manipulation, just work with pure Perl.

The conclusion from the previous benchmark is not a surprise because I was playing to Perl’s strengths. One of Perl’s weak spots is fast numerics, like the sort of random number generator work I introduced yesterday. Is C::Blocks significantly faster than Perl? And, does it deliver performance that is comparable to what we would get with an XS module? To find out, I ran the script located at the bottom of this post. The script implements the random number generator in pure Perl, C::Blocks, and Inline::C. In these results, the N = 10 line indicates the number of iterations run through the random number generator in a single function call, so we actually have a sequence of results for a wide range of iteration count:

--- For N = 10 ---
             Rate    Perl CBlocks  Inline
Perl     275691/s      --    -96%    -97%
CBlocks 7126245/s   2485%      --    -24%
Inline  9331219/s   3285%     31%      --

--- For N = 31 ---
             Rate    Perl CBlocks  Inline
Perl      91167/s      --    -97%    -98%
CBlocks 3243566/s   3458%      --    -44%
Inline  5782587/s   6243%     78%      --

--- For N = 100 ---
             Rate    Perl CBlocks  Inline
Perl      28981/s      --    -98%    -99%
CBlocks 1214700/s   4091%      --    -56%
Inline  2752511/s   9398%    127%      --

--- For N = 316 ---
             Rate    Perl CBlocks  Inline
Perl       9221/s      --    -98%    -99%
CBlocks  413537/s   4385%      --    -59%
Inline  1017940/s  10940%    146%      --

--- For N = 1000 ---
            Rate    Perl CBlocks  Inline
Perl      2901/s      --    -98%    -99%
CBlocks 132740/s   4476%      --    -61%
Inline  337317/s  11528%    154%      --

--- For N = 3162 ---
            Rate    Perl CBlocks  Inline
Perl       922/s      --    -98%    -99%
CBlocks  42309/s   4487%      --    -62%
Inline  110277/s  11856%    161%      --

--- For N = 10000 ---
           Rate    Perl CBlocks  Inline
Perl      293/s      --    -98%    -99%
CBlocks 13525/s   4512%      --    -62%
Inline  35223/s  11911%    160%      --

Above N = 10,000, the relative results were essentially unchanged.

The good news is that the C::Blocks implementation consistently beats out the Perl implementation by a significant margin. Looking at the rate, we see the C::Blocks version producing between 25x and 45x more random numbers per second than the Perl implementation. What’s more, the line counts for these two implementations are pretty similar. Switching to a C::Blocks implementation would produce a huge speedup in this portion of code at essentially the same code complexity.

On the other hand, the Inline::C implementation is clearly the fastest implementation. For 10,000 iterations and higher it will produce 2.6 random numbers with each round of the C::Blocks implementation. For some, that factor of 2.6 is crucial, and an XS-based solution is the right tool.

Real code does not boil down to benchmarks like this. The execution of real code is often governed by a number of slow points whose effect is cumulative. Speeding up the worst offender will speed up your program, but then you’ve only removed one of many slow points. While the others remain, making any satisfactory code blazingly fast is a waste of time. If you are considering using using C to speed up some critical piece of code, C::Blocks won’t give you the fastest implementation, but it’ll probably be fast enough.

Today I looked into the performance of C::Blocks for two important circumstances: a task dominated by I/O and text processing, and a task dominated by computation. For text processing Perl is clearly the right tool for the job. For numerical computation, Inline::C is clearly faster, but if the computation is only one bottleneck in a complicated program, the speedup from C::Blocks is likely good enough to move the pain-point to some other part of the code base.

C::Blocks Advent Day 1 2 3 4 5 6 7 8 9 10 11 12 13

P. S. Why was this entry so late? Apart from being pressed for time, I got benchmark results that varied wildly. I’m OK presenting benchmark results that I don’t like: I’m a bit annoyed by that factor of 2.6, but that’s an fair representation of things. I’m not OK with the kind of crazy results I was getting, sometimes showing that the C::Blocks implementation was competitive with Inline::C, and other times showing that it ran at the same speed as the Perl one. After a wide variety of implementations, I came to the conclusion that it was a CPU memory page alignment issue. The latest version on github fixes this and I will release another pre-Beta version soon (but for now, the bed is calling).

# rng.pl
use strict;
use warnings;

use C::Blocks;
use Inline 'C';
use C::Blocks::Types qw(uint);
use Benchmark qw(:hireswallclock cmpthese);

my uint $N;
my $a = 698769069;
my ($x, $y, $z, $c) = (123456789, 362436000, 521288629, 7654321);
my $reps = 10;
for my $log_n (1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5) {
    $N = int(10**$log_n);
    print "--- For N = $N ---\n";
    cmpthese(-1, { Inline => \&Inline_rng, CBlocks => \&c_blocks_rng,
            Perl => \&Perl_rng});
}

sub Perl_rng {
    my $rand;
    for (1 .. $N) {
        my $t;
        $x = 69069*$x+12345;
        $y ^= ($y<<13); $y ^= ($y>>17); $y ^= ($y<<5); 
        $t = $a*$z+$c; $c = ($t>>32);
        $z = $t;
        $rand = $x+$y+$z;
    }
    return $rand;
}

clex {
    /* Note: y must never be set to zero;
     * z and c must not be simultaneously zero */
    unsigned int x = 123456789,y = 362436000,
        z = 521288629,c = 7654321; /* State variables */

    unsigned int KISS() {
        unsigned long long t, a = 698769069ULL;
        x = 69069*x+12345;
        y ^= (y<<13); y ^= (y>>17); y ^= (y<<5); 
        t = a*z+c; c = (t>>32);
        return x+y+(z=t);
    }
}

sub c_blocks_rng {
    my uint $to_return = 0;
    cblock {
        for (int i = 0; i < $N; i++) $to_return = KISS();
    }
    return $to_return;
}

sub Inline_rng {
    inl_rng($N);
}

__END__

__C__

/* Note: y must never be set to zero;
 * z and c must not be simultaneously zero */
static unsigned int x = 123456789,y = 362436000,
    z = 521288629,c = 7654321; /* State variables */

unsigned int inline_KISS() {
    unsigned long long t, a = 698769069ULL;
    x = 69069*x+12345;
    y ^= (y<<13); y ^= (y>>17); y ^= (y<<5); 
    t = a*z+c; c = (t>>32);
    return x+y+(z=t);
}

unsigned int inl_rng(unsigned int N) {
    int i;
    unsigned int to_return;
    for (i = 0; i < N; i++) to_return = inline_KISS();
    return to_return;
}

Leave a comment

About David Mertens

user-pic This is my blog about numerical computing with Perl.