XS bits: Overloaded interfaces

When writing Perl, people often create hybrid interfaces that accept either a reference to an array or hash, a string, or a reference to a string. The Perl code to do appropriate conversion behind the scenes is usually trivial. Some even use this to overload their interface to do something entirely unrelated depending on the type passed in. However much one might loathe such interfaces, when replacing Perl code with XS, one usually has to reproduce the properties of the original. That is what this entry is about.

A rather reasonable example of such a hybrid interface is PPI::Document whose constructor accepts either a string (interpreted as a file name) or a reference to a scalar (interpreted as a reference to a scalar containing the code as a string). While (different) named arguments would have been clearer for a casual reader of the resulting code, this case of an overloaded interface is a generally reasonable optimization.

The straightforward (and least error prone) way to provide an overloaded interface is to keep the interface in Perl and just call XSUBs from there using a simpler, more XS-friendly interface. If I was replacing said constructor of PPI::Document, I could do something like the following pseudo-code:

package PPI::Document;

sub new {
  my $class = shift;
  my $source = shift;
  if (not ref($source)) {
    return _xs_new_from_file($source);
  elsif (ref($source) and ref($source) eq 'SCALAR') {
    return _xs_new_from_string($source);
  else {

But if the code in question is in a tight loop already or you are a tad crazy, you may want to have this logic in XS as well. This is one way to do it:

new(class, source)
    SV* class;
    SV* source;
    SV* inner;
    if (!SvROK(source))
      mXPUSHs( _new_document_from_file(class, source) );
    else {
      inner = SvRV(source);
      if (SvTYPE(inner) <= SVt_PVMG)
        mXPUSHs( _new_document_from_string(class, SvRV(source)) );

I'll pull that apart in detail for an XS beginner in a moment. The key bits for the interface are !SvROK(source), which tests whether the source SV is a reference at all, and SvTYPE(inner) <= SVt_PVMG, which ensures that the dereferenced SV is a scalar (and not an array, etc.).

While the test for an SV being a reference is fairly common and simple, the test for being a scalar reference is slightly more obscure. It grabs the type (enum) of the SV using SvTYPE() and checks whether the type is smaller or equal to SVt_PVMG. SVt_PVMG indicates a scalar with magic attached. The reason we're using that for less-than-or-equal comparison lies in the order of the SV types: All SV types below SVt_PVMG happen to be scalars. Above, you'll find more complicated things such as arrays, hashes, code references, etc. Using this construct, you could easily add cases that test for array or hash references by comparing (equality!) with SVt_PVAV or SVt_PVHV respectively.

Now, the example punts on one bit of the PPI::Document->new() interface: You can call new() without arguments to receive an empty document. Optional parameters to an XSUB aren't particularly complicated once you understood how parameters are actually passed inside perl. But this is for another post.

Here are a few random notes that may or may not help XS beginners understand the code:

  • The XSUB is declared void and the actual C code is inside an XS section named PPCODE. This tells the XSUB compiler that we will (if necessary) manage returning values via the argument stack ourselves.
  • This is done with the mXPUSHs() macro which takes an SV (which is assumed to be returned by the _new_document_from_* functions) and pushes it on top of the argument stack. The X indicates that it will extend the size of the stack if necessary. The s suffix indicates that we're returning a pre-manufactured SV. The n prefix means that the macro will mortalize the SV. This is roughly equivalent to marking it as a temporary and necessary for all elements of the argument stack. I'm going through all bits of this macro because there are a ton of variants in the API which become moderately obvious once you understood the naming conventions.


Most erros have to do with ammpersands. In valid HTML they should be written as an entity (&amp;) to be valid. This is a very common error.

Some of the errors are from the templates, which I believe we can't touch (or can we?). Is there a specific reason for wanting a validated page? Even mobile browsers are advanced enough to handle this kind of pages and it's XHTML 1.0 Transitional anyway...

Leave a comment

About Steffen Mueller

user-pic Physicist turned software developer. Working on Infrastructure Development at Booking.com. Apparently the only person who thinks the perl internals aren't nearly as bad as people make them. See also: the Booking.com tech blog, CPAN.