C::Blocks Advent Day 9

This is the C::Blocks Advent Calendar, in which I release a new treat each day about the C::Blocks library. Yesterday I illustrated C::Blocks::Object::Magic while writing a simple class that had APIs in both Perl and C. Today I dig into one of the keys of yesterday's example: writing a type that can be used with C::Blocks.

When Perl sees code like this:

my Type::Package $some_variable;

it makes a note that the variable name $some_variable has a "type" with information available in Type::Package. I must emphasize that this type information is associated with the variable name itself: nothing special is done to the underlying scalar. If Type::Package set up a few fields with the fields pragma, then Perl will check the (spelling of) keys of hash dereferencing at compile time. C::Blocks uses this type information in an orthogonal way: to produce custom code for marshalling your data between Perl and C.

To get an idea of how this works, consider the short script from yesterday which used the KISS library. If I use C::Blocks::Filter; before the cblock, then I end up with this script:

use strict;
use warnings;
use C::Blocks;
use KISS;
use C::Blocks::Filter;

my KISS $rng = KISS->new;
print "rng's first value is ", $rng->rand, "\n";
cblock {
    printf("rng's second value is %u\n", KISS::rand($rng));
}

which produces output like the following when run (I have added whitespace for clarity):

$ perl test.pl 
##################################################
void op_func(C_BLOCKS_THX_DECL) {
        SV * SV__PERL_SCALAR_rng = (SV*)PAD_SV(1);
        struct KISS__state * _PERL_SCALAR_rng
            = xs_object_magic_get_struct_rv(aTHX_ SV__PERL_SCALAR_rng); 

        printf("rng's second value is %u\n", KISS__rand(_PERL_SCALAR_rng));
    }
##################################################
rng's first value is 2079675107
rng's second value is 4185567647

The line containing the printf demonstrates things I've discussed previously. KISS::rand becomes KISS__rand and $rng becomes _PERL_SCALAR_rng.

The interesting bit is the thing that comes before. I said on day 6 that C::Blocks detects when you use a sigiled variable in your cblock and injects code to transform that to the SV* (or AV* or HV*) underlying the variable. (It's worth pointing out that C::Blocks does not allow sigiled variables in clex, cshare, or csub blocks because there is no way for it to know which PAD to work with. This only works with cblock blocks.) If the variable is typed, and if the type's package contains c_blocks_init_cleanup, C::Blocks will call that method to get the code to use for the transformation.

The C end of the KISS library is built around a struct pointer. The one method, KISS::rand, expects a pointer to a struct. Furthermore, you could directly read or set the state by accessing the x, y, z, and c members of the struct. It makes sense, then, that the type should produce code that would map $rng to the pointer to the struct. This is what it does. First it gets the SV* for $rng from the current PAD, but instead of putting it in _PERL_SCALAR_rng as it would for untyped variables, it puts it in SV__PERL_SCALAR_rng. The variable _PERL_SCALAR_rng is used for the struct KISS__state pointer, which is unpacked with xs_object_magic_get_struct_rv. By the time we reach the printf line, our Perl $rng has been unpacked, and $rng gets transformed into the pointer to the KISS struct, as expected.

Let's look again at the c_blocks_init_cleanup code from KISS.pm:

sub c_blocks_init_cleanup {
    my ($package, $C_name, $sigil_type, $pad_offset) = @_;

    my $init_code = "$sigil_type * SV_$C_name = ($sigil_type*)PAD_SV($pad_offset); "
        . "struct KISS::state * $C_name = xs_object_magic_get_struct_rv(aTHX_ SV_$C_name); ";

    return $init_code;
}

The method is called with four arguments: the package ("KISS"), the gently mangled C variable name ("_PERL_SCALAR_rng"), the sigil type ("SV"), and the pad offset (1). It returns a single string with the initialization code, utilizing string interpolation throughout.

Here is the init/cleanup code for double arrays, from C::Blocks::Types:

package C::Blocks::Type::double_array;
sub data_type { 'double' }
sub c_blocks_init_cleanup {
    my ($package, $C_name, $sigil_type, $pad_offset) = @_;
    my $data_type = $package->data_type;

    my $init_code = join(";\n",
        "$sigil_type * SV_$C_name = ($sigil_type*)PAD_SV($pad_offset)",
        "STRLEN length_$C_name",
        "$data_type * $C_name = ($data_type*)SvPVbyte(SV_$C_name, length_$C_name)",
        "length_$C_name /= sizeof($data_type)",
        '',
    );

    return $init_code;
}

Unlike my example from KISS, this method is written in such a way that it can be used by other type packages, such as float_array and char_array. These packages simply inherit from this package and implement an alternative data_type method. To see an example of this, try:

use strict;
use warnings;
use C::Blocks;
use C::Blocks::Types qw(char_array);

my char_array $string = "Hello!";
cblock {
    printf("From C, %s\n", $string);
}

When run with -MC::Blocks::Filter, I get

##################################################
void op_func(C_BLOCKS_THX_DECL) {SV * SV__PERL_SCALAR_string = (SV*)PAD_SV(1);
STRLEN length__PERL_SCALAR_string;
char * _PERL_SCALAR_string = (char*)SvPVbyte(SV__PERL_SCALAR_string, length__PERL_SCALAR_string);
length__PERL_SCALAR_string /= sizeof(char);

        printf("From C, %s\n", _PERL_SCALAR_string);
    }
##################################################
From C, Hello!

In this case, quite a bit gets unpacked when using this variable. The original SV* is SV__PERL_SCALAR_string, the character array is _PERL_SCALAR_string, and the length is available as length__PERL_SCALAR_string. In particular, these special variables can be utilized in our cblock code like this:

use strict;
use warnings;
use C::Blocks;
use C::Blocks::Types qw(char_array);

my char_array $string = "Hello!";
cblock {
    printf("The string '%s' is %d characters long\n", $string, length_$string);
}

Notice how length_$string gives the length! For most string operations the length is not crucial because the string ends in a null character. This is not the case for the numerical types: the length is a crucial piece of information needed to process the full contents of the array:

use strict;
use warnings;
use C::Blocks;
use C::Blocks::Types qw(double_array);

my double_array $data = pack('d*', 1 .. 10);
cblock {
    double sum = 0;
    for (int i = 0; i < length_$data; i++) {
        sum += $data[i];
    }
    printf("The sum is %f\n", sum);
}

which produces:

The sum is 55.000000

The idea that length_$variable would resolve to a variable with useful information is an unplanned but very useful side-effect of how the code extractor works. For lack of a better name, I've taken to calling these extra bits of information "prefix macros" because the code extractor only properly resolves them when you add on letters prior to the variable name, not after it.

It turns out that the code generator expects either one or two return values from c_blocks_init_cleanup. The first return value is always the initialization code; the optional second return argument is any cleanup code. This is useful for basic types, which have to call sv_setiv or similar to ensure that any changes you've made are propagated back to the original SV*. Everything we've seen up to this point have involved pointers to things. Modifying those things would lead to the desired side effects, so no cleanup was necessary.

Finally, there is one more trick worth knowing about type handling. Whenever C::Blocks sees a sigiled variable in a cblock it will replace it with the gently mangled name, as we have seen. What if simply using a variable is insufficient? In that case you can resort to using macros. For example, when using C::Blocks::Types::Pointers, you can take the address of a pointer to get something that works (caveats aside). Here is the relevant bit of code from day 7

my double_LL $head = 0;
my double_LLp $tail_p = 0;
cblock {
    $tail_p = &$head;
}

If $head resolved to a local variable, this would lead to an local address, which would become invalid as soon as we left the block. To get around that, C::Blocks::Types::Pointers actually creates a pointer to the desired pointer type called POINTER_TO_$C_name, pointing to the address of the underlying IV slot in the SV*. It then defines a C macro: #define $C_name (*POINTER_TO_$C_name). This means that whenever you see $head in the cblock, it is ultimately replaced with a pointer de-reference.

Today I explained how to create your own types with C::Blocks. When your library provides both a Perl and C interface, types make it possible to flow back and forth between Perl and C code and have your variables resolve to the "right" thing. This lets you concentrate on writing actionable code instead of extracting your data from a Perl SV*.

C::Blocks Advent Day 1 2 3 4 5 6 7 8 9 10 11 12 13

Leave a comment

About David Mertens

user-pic This is my blog about numerical computing with Perl.