Learning XS - C data types

Over the past year, I’ve been self-studying XS and have now decided to share my learning journey through a series of blog posts. This ninth post introduces you to C data types and how to expose them in perl.

How do C data types differ to Perl?

Firstly, Perl doesn't really have built-in types in the same way as C. When you declare a variable in Perl, you use a sigil ('$', '@', or '%') to indicate whether it's a scalar, array, or hash, rather than specifying an explicit data type as you would in C. A scalar can contain many things including a reference to an array or hash and can change type during runtime. In C you have to be specific when defining variables, not only by choosing the appropriate data type (such as 'int', 'float', or 'char'), but also by considering memory allocation. In Perl, memory management is handled automatically; you don't need to worry about allocating or freeing memory for your variables. In contrast, C requires you to manage memory manually, especially when working with dynamic data structures. This means you must explicitly allocate memory (using functions like 'malloc') and free it when it's no longer needed, which gives you more control but also introduces the risk of memory leaks or other errors if not handled carefully.

The below table can be used as a reference guide for C types:

C Data TypeDescriptionExample Declaration
'char'Single character (usually 1 byte)'char c = 'A';'
'int'Integer (size depends on system)'int i = 42;'
'short'Short integer (at least 16 bits)'short s = 10;'
'long'Long integer (at least 32 bits)'long l = 1000L;'
'float'Single-precision floating point'float f = 3.14f;'
'double'Double-precision floating point'double d = 2.718;'
'unsigned int'Integer, only positive values'unsigned int u = 5;'
'unsigned char'Char, only positive values'unsigned char uc = 1;'
'unsigned short'Short, only positive values'unsigned short us = 2;'
'unsigned long'Long, only positive values'unsigned long ul = 3;'
'void'No value/type (used for functions)'void func(void);'
'struct'User-defined composite type'struct Point { int x; int y; };'
'enum'User-defined set of named integer constants'enum Color { RED, GREEN, BLUE };'
'union'User-defined type sharing memory'union Data { int i; float f; };'

In Perl C data types can be represented as scalar values. For the basic types ('char', 'int', 'short', 'long', 'float', 'double'), it is straightforward to convert them to Perl scalars using the macros demonstrated in previous posts, such as 'SvNV', 'SvPVutf8', and 'newSVnv'. Today we are going to explore how to represent C structs in perl, Although what I am about to show you works, personally I believe its better to represent your data in perl structures as it is then easier to debug and for others to extend your modules.

In the next example we will recreate a toy magic 8 ball, we will take an object orientation approach with a new method which instantiates our object then have an ask method which allows us to ask a question (shake the ball) and receive a message. We will then add two simple accessors to access the last asked question and last answer to that question. In our XS approach today we are going to use a struct to store this data, in pure perl where we do not have the struct data type so you would implement something like the following:

package Toy::MagicBall;

use 5.006;
use strict;
use warnings;

use Carp;
use List::Util qw(shuffle);

my @answers = (
    "It is certain.",
    "Ask again later.",
    "Don't count on it.",
    "Yes, definitely.",
    "My reply is no.",
    "Outlook good.",
    "Very doubtful.",
    "Yes.",
    "No.",
    "Cannot predict now."
);

sub new {
    my ($class) = @_;
    my $self = {
        last_answer_index => undef,
        last_question     => '',
    };
    bless $self, $class;
}

sub ask {
    my ($self, $question) = @_;
    croak "You must ask a question" unless defined $question && length $question;
    $self->{last_question} = $question;
    my $idx = int(rand(@answers));
    $self->{last_answer_index} = $idx;
    return $answers[$idx];
}

sub last_question {
    my ($self) = @_;
    return $self->{last_question};
}

sub last_answer {
    my ($self) = @_;
    return defined $self->{last_answer_index}
        ? $answers[ $self->{last_answer_index} ]
        : undef;
}

1;

Okay so lets create the new distribution 'Toy::MagicBall' using module starter:

module-starter --module="Toy::MagicBall" --author="Your Name" --email="your email"

Update the Makefile.PL to include XSMULTI, then upload the lib/Toy/MagicBall.pm file with the this:

package Toy::MagicBall;

use 5.006;
use strict;
use warnings;

our $VERSION = '0.01';

require XSLoader;
XSLoader::load('Toy::MagicBall', $VERSION);

1;

Create a new file called lib/Toy/MagicBall.xs and add the following code:

#define PERL_NO_GET_CONTEXT
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"

typedef struct {
    int last_answer_index;
    char *last_question;
} MagicBall;

static const char *answers[] = {
    "It is certain.",
    "Ask again later.",
    "Don't count on it.",
    "Yes, definitely.",
    "My reply is no.",
    "Outlook good.",
    "Very doubtful.",
    "Yes.",
    "No.",
    "Cannot predict now."
};

MODULE = Toy::MagicBall  PACKAGE = Toy::MagicBall
PROTOTYPES: DISABLE

What we have additionally done from our normal boilerplate is the definition of a struct called 'MagicBall' which contains two fields: 'last_answer_index' and 'last_question', plus an 'answers' array, which will be used to provide responses to the questions asked. The 'typedef' keyword is used to define a new type called 'MagicBall', which is a struct that holds the state of our magic 8 ball. The 'last_answer_index' is an integer that stores the index of the last answer given, and 'last_question' is a pointer to a character string that holds the last question asked. When defining the array the 'static' keyword indicates that the 'answers' array is only visible within this file, which is good practice to avoid polluting the global namespace. The const keyword indicates that the contents of the 'answers' array cannot be modified, which is appropriate since these are fixed responses.

Like always lets add a test file to test our module, create a new file called t/01_magic_ball.t and add the following code:

use Test::More;

use Toy::MagicBall;

my $toy = Toy::MagicBall->new;

is(ref $toy, 'Toy::MagicBall');

Next lets add the 'new' method to create a new instance of the 'MagicBall' struct. Todo this add the following under the package declaration:

SV *
new(pkg, ...)
    SV *pkg
    CODE:
        MagicBall *ball = (MagicBall *)malloc(sizeof(MagicBall));
        ball->last_answer_index = -1;
        ball->last_question = NULL;
        SV *ball_ptr = newSViv(PTR2IV(ball));
        RETVAL = sv_bless(newRV_noinc(ball_ptr), gv_stashsv(pkg, 0));
        srand(time(NULL));
    OUTPUT:
        RETVAL

This code defines a new method that allocates memory for a 'MagicBall' struct, initialises its fields, and returns a blessed reference to it. The 'malloc' function is used to allocate memory for the struct, the 'PTR2IV' function converts the pointer to its reference/pointer address and 'newSViv' creates a new scalar value that holds this. The 'newRV_noinc' function creates a new reference to this scalar, and 'sv_bless' blesses it into the specified package.

If we make, make test now we should see the basic test passes and we have an instantiated object. We cannot do much more validation at this point because we do not have accessors to our struct data yet. If you are interested in seeing what this looks like internally to perl you can simply use 'Data::Dumper' to dump the object:

use Data::Dumper;
warn Dumper($toy);

...

make test

# Output:
bless( do{\(my $o = 4140525456)}, 'Toy::MagicBall' )

Now lets add the 'ask' method to allow us to ask a question and get a response. First add the test:

my %responses = map {
        $_ => 1
} (
        "It is certain.",
        "Ask again later.",
        "Don't count on it.",
        "Yes, definitely.",
        "My reply is no.",
        "Outlook good.",
        "Very doubtful.",
        "Yes.",
        "No.",
        "Cannot predict now."
);

my $result = $toy->ask("Should I learn XS?");
ok($responses{$result}, $result);

The result from our module is random hence we have to check all possible responses. To implement the method add the following code after the 'new' XSUB:

SV *
ask(self, question)
    SV *self
    SV *question
    CODE:
        MagicBall *ball = INT2PTR(MagicBall*, SvIV(SvRV(self)));
        if (ball->last_question) {
            free(ball->last_question);
        }
        ball->last_question = SvPV_nolen(question);
        int num_answers = rand() % sizeof(answers) / sizeof(answers[0]);
        ball->last_answer_index = num_answers;
        SV *answer = newSVpv(answers[ball->last_answer_index], 0);
        RETVAL = answer;
    OUTPUT:
        RETVAL

This code retrieves the 'MagicBall' struct from the object reference, frees any previously allocated memory for the last question, and sets the new question. It then generates a random index to select an answer from the 'answers' array and returns it as a scalar value. 'INT2PTR' is used to convert the integer value of the pointer back to a pointer type, and 'SvIV(SvRV(self))' retrieves the integer value of the reference to the 'MagicBall' struct.

Now lets add the accessors for the last question and last answer. First add the tests:

is($toy->last_question, "Should I learn XS?", "Last question is correct");
is($toy->last_answer, $result, "Last answer is correct");

Next add the following code to the XS file:

SV *
last_question(self)
    SV *self
    CODE:
        MagicBall *ball = INT2PTR(MagicBall*, SvIV(SvRV(self)));
        if (ball->last_question) {
            RETVAL = newSVpv(ball->last_question, 0);
        } else {
            RETVAL = newSVpv("", 0);
        }
    OUTPUT:
        RETVAL

SV *
last_answer(self)
    SV *self 
    CODE:
        MagicBall *ball = INT2PTR(MagicBall*, SvIV(SvRV(self)));
        if (ball->last_answer_index >= 0) {
            RETVAL = newSVpv(answers[ball->last_answer_index], 0);
        } else {
            RETVAL = newSVpv("", 0);
        }
    OUTPUT:
        RETVAL

These methods retrieve the last question and last answer from the 'MagicBall' struct. The 'last_question' method checks if the 'last_question' field is not NULL and returns it as a scalar value. If it is NULL, it returns an empty string. The 'last_answer' method checks if the 'last_answer_index' is valid (greater than or equal to 0) and returns the corresponding answer from the 'answers' array; otherwise, it returns an empty string.

Finally, we need to free the memory allocated for the 'MagicBall' struct when the object is destroyed. To do this, we can add a DESTROY method much like you can do in Perl, this is called on global destruction of the object:

void
DESTROY(self)
    SV *self
    CODE:
        MagicBall *ball = INT2PTR(MagicBall*, SvIV(SvRV(self)));
        if (ball->last_question) {
            free(ball->last_question);
        }
        free(ball);

Now if you run 'make test' again you should see all tests pass, and you have a working XS module that represents a magic 8 ball using a C struct.

I hope you found this post useful, and it has given you a better understanding of how to represent C data types in Perl using XS.

Leave a comment

About Robert Acock

user-pic I blog about Perl.