Writing XS Like a Pro - The INTERFACE Keyword

Like many Perl hackers with a C background, I did my fair share of XS programming. But mostly, I contributed bug fixes and other minor changes. In the rare case that I had to add a new XSUB, I typically used another XSUB from the same project as template. Unfortunately, this cargo cult is common among XS authors.

When I wrote my first public XS module, CommonMark, from scratch, I decided to read the perlxs documentation front to back to get a better understanding of the features XS provides. It did pay off. There are many things I missed and some of them can be extremely helpful. So I take the opportunity to share the things I learned with my fellow XS writers in a series of two or three posts.

The typical XSUB you'll encounter has either a PREINIT (or sometimes INIT), CODE, and OUTPUT block, or a PREINIT and PPCODE block if multiple values are returned. But an XSUB can be much simpler. In fact, it can contain no code at all. Here's an example of a simple accessor:

cmark_node*
cmark_iter_get_node(cmark_iter *iter)

That's it. An XSUB can contain nothing more than a C function declaration. Some things to take away from this example:

  • The parameter list can contain types and parameter names. There's no need to declare the parameter types separately. A trailing semicolon may also be added. Unfortunately, all parameters must be on a single line. For longer lists, you might want to use the traditional K&R-like style.
  • For complex data types, this only works with a custom typemap. More on that in the next part.
  • You'll probably want to use the PREFIX keyword to strip prefixes from the C function name.

So what happens behind the scenes? Unless a CODE or PPCODE section is provided, xsubpp will generate a call to the C function with the given arguments and assign its return value to the special RETVAL variable.

Most of the time, of course, things aren't that simple, and your XSUB must contain additional code. But before falling back to a CODE section, you should consider whether it's enough to decorate the auto-generated function call with an INIT or POSTCALL section. These section are executed before and after the function call. If you need to declare local variables, you can use a PREINIT section. (INIT sections are sometimes seen together with CODE sections. I consider this bad practice. Declarations belong in the PREINIT section, all other initialization code can go into the CODE section.)

The benefit of having an auto-generated function call becomes clear once you know about the INTERFACE keyword. This little known feature makes it possible to share a single XSUB implementation for multiple C functions with the same interface. A simple example are accessors:

const char*
interface_get_utf8(cmark_node *node)
INTERFACE:
    cmark_node_get_type_string
    cmark_node_get_literal
    cmark_node_get_title
    cmark_node_get_url
    cmark_node_get_fence_info

This short piece of code adds five getters returning a string. The name of the interface, interface_get_utf8 in the example, is only used to identify the interface internally. It won't be visible to outside code. A more complex example with INIT and POSTCALL sections:

NO_OUTPUT int
interface_move_node(cmark_node *node, cmark_node *other)
PREINIT:
    cmark_node *old_parent;
    cmark_node *new_parent;
INIT:
    old_parent = cmark_node_parent(other);
INTERFACE:
    cmark_node_insert_before
    cmark_node_insert_after
    cmark_node_prepend_child
    cmark_node_append_child
POSTCALL:
    if (!RETVAL) {
        croak("%s: invalid operation", GvNAME(CvGV(cv)));
    }
    new_parent = cmark_node_parent(other);
    S_transfer_refcount(aTHX_ old_parent, new_parent);

Only a single XSUB is needed for what otherwise would be four almost identical variations. Not only does it result in much more maintainable code, it also reduces the binary size considerably. If you're curious how interfaces are implemented internally, take a look at the generated C code. Multiple CVs are generated sharing the same XSUB. A pointer to the C function is stored in the xcv_xsubany slot of each CV.

The example above also uses the NO_OUTPUT keyword which is useful if you let xsubpp generate your C function call and want to ignore the return value.

1 Comment

Both your module and its code look very nice! I'm looking forward to using it!

Leave a comment

About Nick Wellnhofer

user-pic I'm a freelancer looking for contract work in Munich, Germany, or remotely. Check out my profiles on MetaCPAN, GitHub, LinkedIn and StackOverflow.