C::Blocks Advent Day 7

This is the C::Blocks Advent Calendar, in which I release a new treat each day about the C::Blocks library. Yesterday I illustrated the many ways you generate or modify C source code. Today I discuss basic facilities for handling pointers and building C data structures.

Pointers and data allocation are central to the design of complex data structures. One of the simplest data structures is the linked list. These are useful for storing sequential data when we do not know how many elements we will need at the outset. Using a simple linked list, we can parse the contents of a file and store the results, later running through the data to perform calculations. How would we implement a basic linked list using C::Blocks?

I will begin with the basic C code for this task. The next few chunks of code would go into a clex or cshare block. First, we would declare the linked list structure:

typedef struct double_LL double_LL;
struct double_LL {
    double_LL * next;
    double value;
};

Each element of our linked list will hold a pointer to the next element of the list (which is null if this is the last item in the list), and the actual value of interest. Memory for these will be obtained with Perl's Newx, so we need to Safefree the linked list when we're done. We could implement this using recursion, but I prefer to write it using a loop:

void LL_free(double_LL * curr) {
    while(curr) {
        double_LL * next = curr->next;
        Safefree(curr);
        curr = next;
    }
}

Finally we get to the interesting part, or parts. The insert function needs to take a node in the linked list and a value, and then insert this new value in a new node that follows the current one. It would be nice if this were generic enough that it could handle insertions not at the tail of the list. It would be even nicer if it could allocate a fresh node when passed the null pointer, thus doubling as the list initial allocator. Here is one not-so-good implementation:

double_LL * LL_insert(double_LL * curr, double new_value) {
    double_LL * new_node;
    Newx(new_node, 1, double_LL);
    new_node->value = new_value;
    if (curr == NULL) {
        new_node->next = NULL;
        return new_node;
    }
    new_node->next = curr->next;
    curr->next = new_node;
    return new_node;
}

This function is not ideal because it has to handle an edge condition. It seems like it would almost be better to have a separate constructor, rather than trying to have the insertion method do double-duty. Also, its use is a bit odd:

double_LL * head = NULL;
double_LL * tail = NULL;

/* first entry has to be special cased */
tail = head = LL_insert(NULL, first_value);
/* all others use this insertion idiom */
tail = LL_insert(tail, second_value);
tail = LL_insert(tail, third_value);
...

We could instead use a shorter and more consistent function. Instead of taking (and returning) a double_LL*, this would work with pointers to double_LL*. The extra level of abstraction actually makes things much more consistent:

double_LL ** LL_insert(double_LL ** next_addr, double new_value) {
    double_LL * old_next = *next_addr;
    Newx(*next_addr, 1, double_LL);
    (*next_addr)->value = new_value;
    (*next_addr)->next = old_next;
    return &((*next_addr)->next);
}

To use this, we would set up a tail pointer that initially points to the address of head:

double_LL * head = NULL;
double_LL ** tail = &head;

tail = LL_insert(tail, first_value);
tail = LL_insert(tail, second_value);
tail = LL_insert(tail, third_value);
...

This is really nice for a number of reasons. First, because this insertion function takes a double pointer, the first insertion will update head to point to the first element of the list, but none of the others will change it. Second, head will be NULL if we happen to make zero insertions. Third, if the insertions are performed in the body of a loop, our loop will be more concise because it does not need to watch out for or do anything special for the first entry. Although it is a bit harder to grok, I would argue that this insertion function is better than the previous one.

To get an idea of how I might use this linked list, here is a script that parses my kernel log collecting the times at which kernel events were logged. Notice that I use C::Blocks::Types::Pointers to declare pointer types to facilitate this work. This module is not yet distributed on CPAN, but is in the github repository.

use strict;
use warnings;
use C::Blocks;
use C::Blocks::PerlAPI; # explicit for Newx and Safefree
use C::Blocks::Types qw(double uint);
use C::Blocks::Types::Pointers
    double_LL => 'double_LL*',
    double_LLp => 'double_LL**';

# Basic linked list.
clex {
    ... struct and functions from above ...
}

# Set up Perl variables to hold the pointers to the linked list head and tail.
my double_LL $head = 0;
my double_LLp $tail_p = 0;
cblock {
    $tail_p = &$head;
}

# scan through kernel log and calculate the time between events. I'll assume
# that the path is given in @ARGV so I can just use the diamond operator.
while (my $line = <>) {
    if ($line =~ /kernel: \[(\d+\.\d+)\]/) {
        my double $time = $1;
        cblock {
            $tail_p = LL_insert($tail_p, $time);
        }
    }
}

# Calculate the average time between log entries
my double $avg = 0;
my uint $N_skips = 0;
cblock {
    if ($head != 0) {
        double sum = 0;
        unsigned int N = 0;
        double prev_time = $head->value;
        double_LL * curr = $head->next;
        while (curr != NULL) {
            double diff = curr->value - prev_time;
            if (diff < 100) {
                sum += diff;
                N++;
            }
            else {
                $N_skips++;
            }
            prev_time = curr->value;
            curr = curr->next;
        }
        $avg = sum / N;
    }
}
print "Average time between close log entries is $avg seconds; skipped $N_skips separated entries\n";

# All done, free up the linked list
cblock {
    LL_free($head);
}

While that's a lot of code, notice that after the clex block it breaks basically into four units: set up $head and $tail_p, scan through the log file, calculate and print the statistics, and free up the memory. Thanks to C::Blocks::Types::Pointers, I was able to set up double_LL as a pointer to the struct of the same name, and double_LLp as a pointer to that pointer. This makes it possible to dereference $head->value without getting warnings, or to call LL_insert without warnings.

I hope you can see my point through the silliness of the example. Obviously it's not hard to write a pure Perl script that performs the same calculation with about half as many lines of code. If you do so, you'll see it runs 1x to 2x faster than this C example. My point isn't that you'd use this linked list to perform this calculation. Rather this illustrates how C::Blocks makes it possible to define C data structures. Typing facilities like C::Blocks::Types and C::Blocks::Types::Pointers minimize the boiler plate you would need to perform operations on those data structures. All of these lower the barrier to entry to using C for complex algorithms and data structures.

EDIT: After some thought, I made a slight revision to how C::Blocks::Types::Pointers works so that you can now get the address of a pointer by simply saying &$thing.

C::Blocks Advent Day 1 2 3 4 5 6 7 8 9 10 11 12 13

Leave a comment

About David Mertens

user-pic This is my blog about numerical computing with Perl.