C::Blocks Advent Day 8

This is the C::Blocks Advent Calendar, in which I release a new treat each day about the C::Blocks library. Yesterday I showed one way to build a (mildly) complex data structure, including handling pointers and managing memory. Today I will explain how to tightly control access to pointers using classes and C::Blocks::Object::Magic.

My example code yesterday was heavy on C pointers, which will come as no surprise to anyone who has programmed in C. With C::Blocks::Types::Pointers, managing these pointers was painless, even easy. The cblock line $tail_p = &$head is particularly smooth.

However, that line should also sound off alarm bells to anyone who has worked with both Perl and C. The idea of carrying around pointer values in Perl scalars is not the real problem (while it's not the safest option, XS programmers have been doing that for decades with the T_PTR typemap). The problem is in taking the address of $head. Where does $tail_p actually point? It points to the address of the IV slot of $head. Things could quickly go downhill if we cause $head to upgrade its internal memory representation. This is easier to do in Perl than you might realize.

One way to change a variable's internal representation is to use it in a string context, such as printing it. This example shows exactly that:

use strict;
use warnings;
use C::Blocks;
use C::Blocks::Types::Pointers
    void_p => 'void*',
    void_pp => 'void**';

my void_p $address = 0;
my void_pp $ref_to_address = 0;
cblock {
    $ref_to_address = &$address;
    printf("From C, after assignment, address of address is %p; ref_to_address is %p\n",
        &$address, $ref_to_address);
}
print "From Perl, address is $address\n";
cblock {
    printf("From C again, address of address is %p; ref_to_address is %p\n",
        &$address, $ref_to_address);
}

An example of output for this on my machine is:

$ perl test.pl 
From C, after assignment, address of address is 0x12fb8e0; ref_to_address is 0x12fb8e0
From Perl, address is 0
From C again, address of address is 0x12f8090; ref_to_address is 0x12fb8e0

The agreement in the first line shows that ref_to_address has the correct value. By the third line they disagree. I should reiterate that the problem is not with pointers stored in Perl scalars: these are fine and their values persist correctly. The problem is when I try to use C::Blocks::Types::Pointers to manage a pointer to a pointer, and then accidentally upgrade the SV* holding the original pointer. My pointer-to-a-pointer will point to a newly invalid slot in memory that was just returned to the memory pool.

(Note: if I revise the Perl print to be a printf instead, it would not upgrade the underlying scalar. If you find yourself regularly using C::Blocks::Types::Pointers, you should make a standard practice of using printf instead of print when printing pointer values.)

While there are many other ways to store pointers, the most elegant solution I've seen is XS::Object::Magic. I liked it so much that I ported it to C::Blocks as C::Blocks::Object::Magic. This approach uses Perl Magic (literally) to store pointers. Magic is a mechanism for overriding core behaviors of an individual scalar, array, or hash (such as assignment). It is orthogonal to the object system and does not rely on blessing. Attaching a bit of magic to a Perl variable requires a struct with applicable methods, and an optional pointer to additional information. XS::Object::Magic (and therefore C::Blocks::Object::Magic) store the pointer by adding magic with no methods (a struct filled with null pointers) and using the pointer slot associated with this null magic to store the desired pointer. Using this approach, the pointer is only accessible from C code, and pointers can be attached to a scalar, an array, or a hash. The last option is particularly nice since it means I can write a hashref-based object with C data safely tucked away.

The next three code snippets comprise KISS.pm. It combines a number of concepts I've brought up thus far, so I've broken the code into chunks to illustrate each idea. I start with

# KISS.pm
package KISS;
use strict;
use warnings;
use C::Blocks;
use C::Blocks::Types qw(uint);
use C::Blocks::Object::Magic;

# The KISS random number generator C-side implementation
cshare {
    struct KISS::state {
        unsigned int x, y, z, c;
    };

    /* force xs_object_magic_get_struct_rv to be included in this symbol
     * table, so that imports of KISS get this symbol. */
    void * KISS::ignore_me = &xs_object_magic_get_struct_rv;

    unsigned int KISS::rand(struct KISS::state * s) {
        unsigned long long t, a = 698769069ULL;
        s->x = 69069*s->x+12345;
        s->y ^= (s->y<<13); s->y ^= (s->y>>17); s->y ^= (s->y<<5); 
        t = a*s->z+s->c; s->c = (t>>32);
        return s->x+s->y+(s->z=t);
    }
}

Because this uses C::Blocks and contains a cshare block, this module will provide C code to the lexical contexts where it is used. I use double-colons in my struct and function names so to minimize the likelihood of name clashes with other libraries. Also notice the bit about KISS::ignore_me. I have this line to force C::Blocks to copy the symbol xs_object_magic_get_struct_rv into this symbol table. This ensures that any code that uses this one will be able to call that function. I'll cover more about symbol table tricks like this in a later post.

# Also make it possible to use KISS as a cblock type
sub c_blocks_init_cleanup {
    my ($package, $C_name, $sigil_type, $pad_offset) = @_;

    my $init_code = "$sigil_type * SV_$C_name = ($sigil_type*)PAD_SV($pad_offset); "
        . "struct KISS::state * $C_name = xs_object_magic_get_struct_rv(aTHX_ SV_$C_name); ";

    return $init_code;
}

By implementing a function called c_blocks_init_cleanup, KISS can be used as a type for C::Blocks. This means that I can type my KISS $rng, and this type conversion code will be used. In fact, all code written after this function can use the KISS type, even code in the same module. Obviously there's a lot going on in this that is beyond the scope of this treat: I'll cover how to write a type library soon.

# Perl-side constructor. Build an empty hash and attach the
# rng state struct to it.
sub new {
    my $class = shift;
    my $self = bless {}, $class;

    cblock {
        struct KISS::state * state;
        Newx(state, 1, struct KISS::state);
        *state = (struct KISS::state){123456789, 362436000, 521288629, 7654321};
        xs_object_magic_attach_struct(aTHX_ SvRV($self), state);
    }

    return $self;
}

sub DESTROY {
    my KISS $self = shift;
    cblock {
        Safefree($self);
    }
}

# Perl-side method for calling the rng
sub rand {
    my KISS $self = shift;
    my uint $to_return = 0;
    cblock {
        $to_return = KISS::rand($self);
    }
    return $to_return;
}

1;

Finally I get to the Perl code: the new method builds an object including its hidden state struct, DESTROY frees up the allocated memory, and rand gets the next random number. The module uses C::Blocks::Object::Magic and illustrates how to use xs_object_magic_attach_struct to attach the struct to the object. It also uses xs_object_magic_get_struct_rv to get the struct, though you probably missed it because it's buried in the type definition.

And now I can write a script that uses this module:

use strict;
use warnings;
use C::Blocks;
use KISS;

my KISS $rng = KISS->new;
print "rng's first value is ", $rng->rand, "\n";
cblock {
    printf("rng's second value is %u\n", KISS::rand($rng));
}

When run, that script prints:

$ perl test.pl 
rng's first value is 2079675107
rng's second value is 4185567647

This is a short script, and on its surface it looks pretty simple: I create a new KISS random number generator and I use it to produce two random numbers. The remarkable aspect of this script is that I use the same random number generator---even the same variable name $rand---in both Perl and C code. The object underlying $rng fluidly moves between the two contexts because the KISS package provides type information. The Perl-side rand method ultimately invokes KISS::rand, which means that I can generate random numbers with my object in whichever context is more convenient. To accomplish this feat, I wrote a module that provides both a Perl and a C interface to a struct, but even the module was not terribly hard to write.

The short script above does not actually show off the utility of using C::Blocks::Object::Magic. To see that, I need to utilize the fact that the object is a blessed hash:

use strict;
use warnings;
use C::Blocks;
use KISS;

my KISS $rng = KISS->new;
$rng->{name} = 'Gerry';
print "$rng->{name}'s first value is ", $rng->rand, "\n";
cblock {
    printf("rng's second value is %u\n", KISS::rand($rng));
}

Running this produces the following output:

$ perl test.pl 
Gerry's first value is 2079675107
rng's second value is 4185567647

If I decide while developing a library that I need certain information to be available from C then I add it to the struct, but if the information only needs to be available from Perl then I can simply store it in the hash. Having worked with Prima, which uses a different scheme to support hashref-based objects with a core C struct underneath, I have found this to be particularly useful for storing data used by custom handlers. Of course, subclassing would be a more systematic way to achieve the same goal, but either way the hashref is indispensable for storing the data.

A critique of this approach is that the module author must write distinct C and Perl methods (usually with one calling the other). This sort of code duplication will be cumbersome for anything but the smallest of projects. The proper solution to this problem is an object system, something that simultaneously builds both C and Perl methods from a common declaration. Such a system is not yet available. However, C::Blocks is up to the task and in the coming days I will provide a number of treats that go through the capabilities needed to write a proper object system.

Today I gave an example of a Perl class that provides both a C and a Perl interface. In particular, I showed how C::Blocks::Object::Magic makes it easy to have a hashref-based object while safely storing a pointer to an underlying struct for the C-visible state. I glossed over how to write a C::Blocks type, an important detail that I will discuss soon. A proper object system for C::Blocks will require some way to implement inheritance in C. How is this accomplished? These details will be some of the forthcoming treats this Advent.

C::Blocks Advent Day 1 2 3 4 5 6 7 8 9 10 11 12 13

Leave a comment

About David Mertens

user-pic This is my blog about numerical computing with Perl.