C::Blocks Advent Day 10

This is the C::Blocks Advent Calendar, in which I release a new treat each day about the C::Blocks library. Yesterday I dug into the details of writing a type that can be used with C::Blocks. Today I explain how to use C::Blocks in multithreaded Perl code.

Over private correspondence, somebody asked how C::Blocks handles multiple threads. Does it segfault? Does it detect the situation and die? To illustrate what you have to do with C::Blocks to get paralellization to work right, let's start with parallelized Perl code and mutate it. Here's the basic Perl script:

use strict;
use warnings;
use threads;

for my $tid (1 .. 10) {
    async sub {
        sleep 1;
        print "Hello from thread $tid\n";
        threads->exit;
    };
}
print "Back in Perl\n";
$_->join foreach threads->list;
print "All done!\n";

When run we get:

$ perl test.pl
Back in Perl
Hello from thread 1
Hello from thread 2
Hello from thread 3
Hello from thread 4
Hello from thread 5
Hello from thread 6
Hello from thread 7
Hello from thread 8
Hello from thread 9
Hello from thread 10
All done!

Basically, this kicks off 10 threads that sleep for a second and print a message. The messages print in order, but there's no guarantee that they would do this if the content of their execution times varied. The simplest way to test C::Blocks is to replace the print statement in the async block with a bit of C code. For example:

use strict;
use warnings;
use C::Blocks;
use threads;

for my $tid (1 .. 10) {
    async sub {
        sleep 1;
        cblock {
            printf("Hello from thread %d\n", SvIV($tid));
        }
        threads->exit;
    };
}
print "Back in Perl\n";
$_->join foreach threads->list;
print "All done!\n";

which produces...

$ perl test.pl
Back in Perl
Hello from thread 1
Hello from thread 2
Hello from thread 3
Hello from thread 4
Hello from thread 5
Hello from thread 6
Hello from thread 7
Hello from thread 8
Hello from thread 9
Hello from thread 10
All done!

As you can see, the cblock is able to compile without a problem, and the code within it runs in each thread without a problem. Everything Just Works. This is something of a surprise since the Tiny C Compiler used to compile C::Blocks code with is not thread-safe.

Why does this work? Recall that the code in cblock, clex, and all the other blocks get compiled during script compile time. This is almost always a single-threaded process: multiple threads tend to be spawned at runtime, not compile time. This means that the Tiny C Compiler is used in a single-threaded fashion, producing object code and stuffing function pointers in the op tree. Then when threads get spawned, the function pointer addresses for cblocks get copied with the op-tree, so each thread calls the exact same function in each thread. Threads in C have access to the memory in other threads, so calling a function whose object code is stored in another thread is not a problem.

When might things go wrong? The major means for seriously breaking things is to use C::Blocks in a string eval that is evaluated in many threads simultaneously. Tiny C Compiler's lack of thread safety will probably lead to catastrophic behavior. Even if the compiler does not croak, it'll probably produce invalid machine code. Presumably this could be protected by guarding use of the TCC with a parallel memory lock, but this has not yet been implemented.

Using C::Blocks we can bend the rules of data sharing a bit. In this rather silly bit of code, each thread sets one of the characters in a string that is not shared across threads at the Perl level:

use strict;
use warnings;
use C::Blocks;
use C::Blocks::Types qw(Int char_array);
use threads;

# Store a pointer to the SV's PV slot
clex { char * message_workspace; }
my $message = "Hello, world!!!";
cblock {
    message_workspace = SvPVX($message);
}
print "Message is '$message'\n";

# Overwrite the contents of $message; each thread handles a letter
my @to_replace = ( qw(H a p p y), ' ', qw(h o l i d a y s !) );
for my Int $tid (1 .. length($message)) {
    async sub {
        my char_array $new_letter = $to_replace[$tid-1];
        cblock {
            message_workspace[$tid-1] = *$new_letter;
        }
        threads->exit;
    };
}
$_->join foreach threads->list;
print "Message ended up as '$message'\n";

This works as expected, but the careful observer may be a bit troubled at this approach for two reasons. First, the thread that owns $message must not alter the scalar using regular Perl operations like assignment. If the main thread altered $message, it could invalidate the memory slot pointed to by message_workspace, and lead to invalid memory access in the cblock code. Second, the thread that owns $message must not exit before the other working threads. In this case the primary thread outlives all the async threads and this is not a problem, but in other situations it might be tempting to assign message_workspace to a memory slot that will become invalid before all threads finish. This approach works only because I could ensure that the variable lives long enough for each thread to do its work, and ensure that the variable is not modified in a way that would change the allocated memory slot.

One way to side-step both of the problems just mentioned is to allocate special memory for the task. Allocated memory that is not tied to a Perl variable cannot be accidentally clobbered by an innocent bit of Perl code. Furthermore, Perl's C API provides functions to allocate and free memory that will survive even if the original thread terminates. To do that, use one of the saveshared* family of functions and deallocate with the usual Safefree:

use strict;
use warnings;
use C::Blocks;
use C::Blocks::Types qw(Int char_array);
use threads;

# Copy the contents of an SV to shared memory
clex { char * message_workspace; }
my $message = "Hello, world!!!";
cblock {
    message_workspace = savesharedsvpv($message);
}
print "Message is '$message'\n";

# Overwrite the contents of the message; each thread handles a character
my @to_replace = ( qw(H a p p y), ' ', qw(h o l i d a y s !) );
for my Int $tid (1 .. length($message)) {
    async sub {
        my char_array $new_letter = $to_replace[$tid-1];
        cblock {
            message_workspace[$tid-1] = *$new_letter;
        }
        threads->exit;
    };
}
$_->join foreach threads->list;

# Copy the updated message back to $message and free shared memory
cblock {
    sv_setpv($message, message_workspace);
    Safefree(message_workspace);
}
print "Message ended up as '$message'\n";

Compared to the previous script, this one merely adds copying the data into shared memory and back. This is unnecessary caution for this script since we can easily verify that the main thread does not modify $message and since $message outlives all of the working threads. However, this approach in a larger codebase would make it more robust against accidents.

Today I showed that C::Blocks works perfectly fine when working with multiple threads, so long as you avoid simultaneous string evals using C::Blocks code in multiple threads. Operating on shared memory also works fairly well, and judicious use of the saveshared* family of functions can help avoid accidental segfaults or data corruption.

C::Blocks Advent Day 1 2 3 4 5 6 7 8 9 10 11 12 13

Leave a comment

About David Mertens

user-pic This is my blog about numerical computing with Perl.