Premium XS Integration, Pt 2

This is a continuation of a series of articles about how to write XS libraries that are more convenient and foolproof for the Perl users, while not blocking them from using the actual C API.

If you spot anything wrong, or want to contribute suggestions, open an issue at the GitHub repo

Wrapping Transient Objects

One frequent and difficult problem you will encounter when writing XS wrappers around a C library is what to do when the C library exposes a struct which the user needs to see, but the lifespan of that struct is controlled by something other than the reference the user is holding onto.

For example, consider the Display and Screen structs of libX11. When you connect to an X server, the library gives you a Display pointer. Within that Display struct are Screen structs. Some of the X11 API uses those Screen pointers as parameters, and you need to expose them in the Perl interface. But, if you call XCloseDisplay on the Display pointer those Screen structs get freed, and now accessing them will crash the program. The Perl user might still be holding onto a X11::Xlib::Screen Perl object, so how do you stop them from crashing the program when they check an attribute of that object?

Indirect References

For the case of X11 Screens there was an easy workaround: The Screen structs are numbered, and a pair of (Display, ScreenNumber) can refer to the Screen struct without needing the pointer to it. Because the Perl Screen object references the Perl Display object, the methods of Screen can check whether the display is closed before resolving the pointer to a Screen struct, and die with a useful message instead of a crash.

From another perspective, you can think of them like symlinks. You reference one Perl object which has control over its own struct’s lifecycle and then a relative path from that struct to whatever internal data structure you’re wrapping with the current object.

While this sounds like a quick solution, there’s one other detail to worry about: cyclical references. If the sub-object is referring to the parent object, and the parent refers to a collection of sub-objects, Perl will never free these objects. For the case of X11 Screens, the list of screen structs is known at connection-time and is almost always just one Screen, and doesn’t change at runtime. [1] An easy solution for a case like this is to have a strong reference from Display to Screen, and weak references (Scalar::Util::weaken) from Screen to Display, and create all the Screen objects as soon as the Display is connected.

1) this API is from an era before people thought about connecting new monitors while the computer was powered up, and these days can more accurately be thought of as a list of graphics cards rather than “screens”

Lazy Cache of Wrapper Objects

If the list of Screens were dynamic, or if I just didn’t want to allocate them all upfront for some reason, another approach is to wrap the C structs on demand. You could literally create a new wrapper object each time they access the struct, but you’d probably want to return the same Perl object if they access two references to the same struct. One way to accomplish this is with a cache of weak references.

In Perl it would look like:

package MainObject {
  use Moo;
  use Scalar::Util 'weaken';

  has is_closed         => ( is => 'rwp' );

  # MainObject reaches out to invalidate all the SubObjects
  sub close($self) {
    ...
    $self->_set_is_closed(1);
  }

  has _subobject_cache => ( is => 'rw', default => sub {+{}} );

  sub _new_cached_subobject($self, $ptr) {
    my $obj= $self->_subobject_cache->{$ptr};
    unless (defined $obj) {
      $obj= SubObject->new(main_ref => $main, data_ptr => $ptr);
      weaken($self->_subobject_cache->{$ptr}= $obj);
    }
    return $obj;
  }

  sub find_subobject($self, $search_key) {
    my $data_ptr= _xs_find_subobject($self, $search_key);
    return $self->_new_cached_subobject($data_ptr);
  }
}

package SubObject {
  use Moo;

  has main_ref => ( is => 'ro' );
  has data_ptr => ( is => 'ro' );

  sub method1($self) {
    # If main is closed, stop all method calls
    croak "Object is expired"
      if $self->main_ref->is_closed;
    ... # operate on data_ptr
  }

  sub method2($self) {
    # If main is closed, stop all method calls
    croak "Object is expired"
      if $self->main_ref->is_closed;
    ... # operate on data_ptr
  }
}

Now, the caller of find_subobject gets a SubObject, and it has a strong reference to MainObject, and MainObject’s cache holds a weak reference to the SubObject. If we call that same method again with the same search key while the first SubObject still exists, we get the same Perl object back. As long as the user holds onto the SubObject, the MainObject won’t expire, but the SubObjects can get garbage collected as soon as they aren’t needed.

One downside of this exact design is that every method of SubObject which uses data_ptr will need to first check that main_ref isn’t closed (like shown in method1). If you have frequent method calls and you’d like them to be a little more efficient, here’s an alternate version of the same idea:

package MainObject {
  ...

  # MainObject reaches out to invalidate all the SubObjects
  sub close($self) {
    ...
    $_->data_ptr(undef)
      for grep defined, values $self->_subobject_cache->%*;
  }

  ...
}

package SubObject {
  ...

  sub method1($self) {
    my $data_ptr= $self->data_ptr
      // croak "SubObject belongs to a closed MainObject";
    ... # operate on data_ptr
  }

  sub method2($self) {
    my $data_ptr= $self->data_ptr
      // croak "SubObject belongs to a closed MainObject";
    ... # operate on data_ptr
  }

  ...
}

In this pattern, the sub-object doesn’t need to consult anything other than its own pointer before getting to work, which comes in really handy with the XS Typemap. The sub-object also doesn’t need a reference to the main object (unless you want one to prevent the main object from getting freed while a user holds SubObjects) so this design is a little more flexible. The only downside is that closing the main object takes a little extra time as it invalidates all of the SubObject instances, but in XS that time won’t be noticeable.

Lazy Cache of Wrapper Objects, in XS

So, what does the code above look like in XS? Here we go…

/* First, the API for your internal structs */

struct MainObject_info {
  SomeLib_MainObject *obj;
  HV *wrapper;
  HV *subobj_cache;
  bool is_closed;
};

struct SubObject_info {
  SomeLib_SubObject *obj;
  SomeLib_MainObject *parent;
  HV *wrapper;
};

struct MainObject_info*
MainObject_info_create(HV *wrapper) {
  struct MainObject_info *info= NULL;
  Newxz(info, 1, struct MainObject_info);
  info->wrapper= wrapper;
  return info;
}

void MainObject_info_close(struct MainObject_info* info) {
  if (info->is_closed) return;
  /* All SubObject instances are about to be invalid */
  if (info->subobj_cache) {
    HE *pos;
    hv_iterinit(info->subobj_cache);
    while (pos= hv_iternext(info->subobj_cache)) {
      /* each value of the hash is a weak reference,
         which might have become undef at some point */
      SV *subobj_ref= hv_iterval(info->subobj_cache, pos);
      if (subobj_ref && SvROK(subobj_ref)) {
        struct SubObject_info *s_info =
          SubObject_from_magic(SvRV(subobj_ref), 0);
        if (s_info) {
          /* it's an internal piece of the parent, so
             no need to call a destructor here */
          s_info->obj= NULL;
          s_info->parent= NULL;
        }
      }
    }
  }
  SomeLib_MainObject_close(info->obj);
  info->obj= NULL;
  info->is_closed= true;
}

void MainObject_info_free(struct MainObject_info* info) {
  if (info->obj)
    MainObject_info_close(info);
  if (info->subobj_cache)
    SvREFCNT_dec((SV*) info->subobj_cache);
  /* The lifespan of 'wrapper' is handled by perl,
   * probably in the process of getting freed right now.
   * All we need to do is delete our struct.
   */
  Safefree(info);
}

The gist here is that MainObject has a set of all SubObject wrappers which are still held by the Perl script, and during “close” (which, in this hypothetical library, invalidates all SubObject pointers) it can iterate that set and mark each wrapper as being invalid.

The Magic setup for MainObject goes just like in the previous article:

static int MainObject_magic_free(pTHX_ SV* sv, MAGIC* mg) {
  MainObject_info_free((struct MainObject_info*) mg->mg_ptr);
}
static MAGIC MainObject_magic_vtbl = {
  ...
};

struct MainObject_info *
MainObject_from_magic(SV *objref, int flags) {
  ...
}

The destructor for the magic will call the destructor for the info struct. The “frommagic” function instantiates the magic according to ‘flags’, and so on.

Now, the Magic handling for SubObject works a little differently. We don’t get to decide when to create or destroy SubObject, we just encounter these pointers in the return values of the C library functions, and need to wrap them in order to show them to the perl script.

/* Return a new ref to an existing wrapper, or
 * create a new wrapper and cache it.
 */
SV * SubObject_wrap(SomeLib_SubObject *sub_obj) {
  /* If your library doesn't have a way to get the main object
   * from the sub object, this gets more complicated.
   */
  SomeLib_MainObject *main_obj= SomeLib_SubObject_get_main(sub_obj);
  SV **subobj_entry= NULL;
  SubObject_info *s_info= NULL;
  HV *wrapper= NULL;
  SV *objref= NULL;
  MAGIC *magic;

  /* lazy-allocate the cache */
  if (!main_obj->subobj_cache) {
    main_obj->subobj_cache= newHV();

  /* See if the SubObject has already been wrapped.
   * Use the pointer as the key
   */
  subobj_entry= hv_fetch(
    main_obj->subobj_cache,
    &sub_obj, sizeof(void*), 1
  );
  if (!subobj_entry)
    croak("lvalue hv_fetch failed"); /* should never happen */

  /* weak references may have become undef */
  if (*subobj_entry && SvROK(*subobj_entry))
    /* we can re-use the existing wrapper */
    return newRV_inc( SvRV(*subobj_entry) );

  /* Not cached. Create the struct and wrapper. */
  Newxz(s_info, 1, struct SubObject_info);
  s_info->obj= sub_obj;
  s_info->wrapper= newHV();
  s_info->parent= main_obj;
  objref= newRV_noinc((SV*) s_info->wrapper);
  sv_bless(objref, gv_stashpv("YourProject::SubObject", GV_ADD));

  /* Then attach the struct pointer to its wrapper via magic */
  magic= sv_magicext((SV*) s_info->wrapper, NULL, PERL_MAGIC_ext,
      &SubObject_magic_vtbl, (const char*) s_info, 0);
#ifdef USE_ITHREADS
  magic->mg_flags |= MGf_DUP;
#else
  (void)magic; // suppress warning
#endif

  /* Then add it to the cache as a weak reference */
  *subobj_entry= sv_rvweaken( newRV_inc((SV*) s_info->wrapper) );

  /* Then return a strong reference to it */
  return objref;
}

Again, this is roughly equivalent to the Perl implementation of new_cached_subobject above.

Now, when methods are called on the SubObject wrapper, we want to throw an exception if the SubObject is no longer valid. We can do that in the function that the Typemap uses:

struct SubObject_info *
SubObject_from_magic(SV *objref, int flags) {
  struct SubObject_info *ret= NULL;

  ... /* inspect magic */

  if (flags & OR_DIE) {
    if (!ret)
      croak("Not an instance of SubObject");
    if (!ret->obj)
      croak("SubObject belongs to a closed MainObject");
  }
  return ret;
}

Now, the Typemap:

TYPEMAP
struct MainObject_info *   O_SomeLib_MainObject_info
SomeLib_MainObject*        O_SomeLib_MainObject
struct SubObject_info *    O_SomeLib_SubObject_info
SomeLib_SubObject*         O_SomeLib_SubObject

INPUT
O_SomeLib_MainObject_info
  $var= MainObject_from_magic($arg, OR_DIE);

INPUT
O_SomeLib_MainObject
  $var= MainObject_from_magic($arg, OR_DIE)->obj;

INPUT
O_SomeLib_SubObject_info
  $var= SubObject_from_magic($arg, OR_DIE);

INPUT
O_SomeLib_SubObject
  $var= SubObject_from_magic($arg, OR_DIE)->obj;

OUTPUT
O_SomeLib_SubObject
  sv_setsv($arg, sv_2mortal(SubObject_wrap($var)));

This time I added an “OUTPUT” entry for SubObject, because we can safely wrap any SubObject pointer that we see in any of the SomeLib API calls, and get the desired result.

There’s nothing stopping you from automatically wrapping MainObject pointers with an OUTPUT typemap, but that’s prone to errors because sometimes an API returns a pointer to the already-existing MainObject, and you don’t want perl to put a second wrapper on the same MainObject. This problem doesn’t apply to SubObject, because we re-use any existing wrapper by checking the cache. (of course, you could apply the same trick to MainObject and have a global cache of all the known MainObject instances, and actually I do this in X11::Xlib)

But in general, for objects like MainObject I prefer to special-case my constructor (or whatever method initializes the instance of SomeLib_MainObject) with a call to _from_magic(..., AUTOCREATE) on the INPUT typemap rather than returning the pointer and letting Perl’s typemap wrap it on OUTPUT.

After all that, it pays off when you add a bunch of methods in the rest of the XS file.

Looking back to the find_subobject method of the original Perl example, all you need in the XS is basically the prototype for that function of SomeLib:

SomeLib_SubObject *
find_subobject(main, search_key)
  SomeLib_MainObject *main
  char *key

and XS translation handles the rest!

Reduce Redundancy in your Typemap

I should mention that you don’t need a new typemap INPUT/OUTPUT macro for every single data type. The macros for a typemap provide you with a $type variable (and others, see perldoc xstypemap) which you can use to construct function names, as long as you name your functions consistently. If you have lots of different types of sub-objects, you could extend the previous typemap like this:

TYPEMAP
struct MainObject_info *    O_INFOSTRUCT_MAGIC
SomeLib_MainObject*         O_LIBSTRUCT_MAGIC

struct SubObject1_info *    O_INFOSTRUCT_MAGIC
SomeLib_SubObject1*         O_LIBSTRUCT_MAGIC_INOUT

struct SubObject2_info *    O_INFOSTRUCT_MAGIC
SomeLib_SubObject2*         O_LIBSTRUCT_MAGIC_INOUT

struct SubObject3_info *    O_INFOSTRUCT_MAGIC
SomeLib_SubObject3*         O_LIBSTRUCT_MAGIC_INOUT

INPUT
O_INFOSTRUCT_MAGIC
  $var= @{[ $type =~ / (\w+)/ ]}_from_magic($arg, OR_DIE);

INPUT
O_LIBSTRUCT_MAGIC
  $var= @{[ $type =~ /_(\w*)/ ]}_from_magic($arg, OR_DIE)->obj;

INPUT
O_LIBSTRUCT_MAGIC_INOUT
  $var= @{[ $type =~ /_(\w*)/ ]}_from_magic($arg, OR_DIE)->obj;

OUTPUT
O_LIBSTRUCT_MAGIC_INOUT
  sv_setsv($arg, sv_2mortal(@{[ $type =~ /_(\w*)/ ]}_wrap($var)));

Of course, you can choose your function names and type names to fit more conveniently into these patterns.

Finding the MainObject for a SubObject

Now, you maybe noticed that I made the convenient assumption that the C library has a function that looks up the MainObject of a SubObject:

SomeLib_MainObject *main= SomeLib_SubObject_get_main(sub_obj);

That isn’t always the case. Sometimes the library authors assume you have both pointers handy and don’t bother to give you a function to look one up from the other.

The easiest workaround is if you can assume that any function which returns a SubObject also took a parameter of the MainObject as an input. Then, just standardize the variable name given to the MainObject and use that variable name in the typemap macro.

OUTPUT
O_SomeLib_SubObject
  sv_setsv($arg, sv_2mortal(SubObject_wrap(main, $var)));

This macro blindly assumes that “main” will be in scope where the macro gets expanded, which is true for my example:

SomeLib_SubObject *
find_subobject(main, search_key)
  SomeLib_MainObject *main
  char *key

But, what if it isn’t? What if the C API is basically walking a linked list, and you want to expose it to Perl in a way that the user can write:

for (my $subobj= $main->first; $subobj; $subobj= $subobj->next) {
  ...
}

The problem is that the “next” method is acting on one SubObject and returning another SubObject, with no reference to “main” available.

Well, if a subobject wrapper exists, then it knows the main object, so you just need to look at that SubObject info’s pointer to parent (the MainObject) and make that available for the SubObject’s OUTPUT typemap:

SomeLib_SubObject *
next(prev_obj_info)
  struct SubObject_info *prev_obj_info;
  INIT:
    SomeLib_MainObject *main= prev_obj_info->parent;
  CODE:
    RETVAL= SomeLib_SubObject_next(prev_obj_info->obj);
  OUTPUT:
    RETVAL

So, now there is a variable ‘main’ in scope when it’s time for the typemap to construct a wrapper for the SomeLib_SubObject.

Conclusion

In Perl, the lifespan of objects is nicely defined: the destructor runs when the last reference is lost, and you use a pattern of strong and weak references to control the order the destructors run. In C, the lifespan of objects is dictated by the underlying library, and you might need to go to some awkward lengths to track which ones the Perl user is holding onto, and then flag those objects when they become invalid. While somewhat awkward, it’s very possible thanks to weak references and hashtables keyed on the C pointer address, and the users of your XS library will probably be thankful when they get a useful error message about violating the lifecycle of objects, instead of a mysterious segfault.

Premium XS Integration, Pt 1

Intro

There are several competing philosophies for wrapping external C libraries. One is that the XS module should hide all the details of the library and provide a clean “Perlish interface”. The opposite extreme is that the external C functions should be exposed to Perl using an extremely minimal XS layer, or the Foreign Function Interface (FFI) and all the logic for working with the library should be written in Perl.

I advocate something in the middle. I think that a good interface should expose as much of the low-level as possible (to make the most usage of that library possible by other Perl modules) while “padding the sharp edges” so that it is difficult for Perl-side usage to crash the program. Higher level features can be provided in addition to the low level API via XS, Perl modules, or both.

If you consider that the average C library is an awkward mess of state machines and lightly-enforced state requirements that will segfault if not carefully obeyed, wrapping that nicely for the Perl developer is going to require a lot of data translation and runtime sanity checks. If you skip those runtime sanity checks in your wrapper library, it drags down the efficiency of your subsequent Perl development to the level of C development, which is to say, sitting around scratching your head for hours wondering why the program keeps segfaulting. (or attaching gdb to your debug build of perl) If you write those runtime checks in Perl, like with the FFI approach, your runtime performance can suffer significantly. If you write those runtime checks in XS, you can actually do quite a lot of them before there’s any notable decrease in the performance of the script.

Meanwhile, C code runs an order of magnitude faster than Perl opcodes, so if you’re going to require the end user to use a compiled module already, I feel it makes sense to put as much of the higher-level routines into XS as you have time for. But, the higher level routines shouldn’t be at the expense of the lower-level ones, or else you limit what people can do with the library.

This guide will explain all the tricks I know to write safe, fast, convenient, and powerful XS libraries.

(If you spot anything wrong, or want to contribute suggestions, open an issue at the GitHub repo

Binding Objects

One of the first things you’ll need to do for any C library which allocates “objects” is to bind them to a matching Perl object, usually a blessed scalar ref or hash ref. (The C language doesn’t have official objects of course, but a library often allocates a struct or opaque pointer with a lifespan and a destructor function that they expect you to call when you’re done with it, which is the same theme as an object.)

If you read through the common tutorials, you’ll probably see a recipe like

SV*
new(class, some_data)
  SV *class;
  IV some_data;
  INIT:
    LibWhaever_obj *obj;
  CODE:
    obj= LibWhaever_create(some_data);
    if (!obj) croak("LibWhaever_create failed");
    RETVAL= (SV*) newRV_noinc(newSViv((IV)obj));
    sv_bless(RETVAL,
             gv_stashpv("YourProject::LibWhatever", GV_ADD));
  OUTPUT:
    RETVAL

void
DESTROY(self)
  SV *self;
  INIT:
    LibWhaever_obj *obj;
  PPCODE:
    obj= (LibWhaever_obj*) SvIV(SvRV(self));
    LibWhaever_destroy(obj);
    XSRETURN(0);

This is about the least effort/overhead you can have for binding a C data structure to a Perl blessed scalar ref, and freeing it when the Perl object goes out of scope. (you can also move some of this code to the typemap, but I’ll come back to that later)

I don’t like this pattern for several reasons:

  • If someone passes the object to Storable’s dclone, it happily makes a copy of your scalar ref and then when the first object goes out of scope it runs the destructor, and the other object is now referring to freed memory and will probably segfault during its next use.
  • When you create a new thread in a threaded Perl, it clones objects, creating the same bug.
  • The pointer is stored as an integer visible to Perl, and could get altered by sloppy/buggy Perl code, and then you get a segfault.
  • A user could subclass the XS object, and write their own DESTROY method that forgets to call $self->SUPER::DESTROY, leaking the C object.
  • Sloppy/buggy Perl code could re-bless the class, also bypassing the DESTROY call.
  • Sloppy/buggy Perl code could call DESTROY on something which isn’t the blessed scalar-ref containing a valid pointer.

While most of these scenarios shouldn’t happen, if by unfortunate circumstances they do happen, someone loses a bunch of hours debugging it, especially if they aren’t the XS author and don’t know about these pitfalls.

Magic

A much more reliable way to link the C structs to the Perl blessed refs is through Perl’s “magic” system. Magic is the name for essentially a pointer within the SV/AV/HV of your object which points to a linked list of C metadata. This metadata describes various things, like operator-overloading or ties or other low-level Perl features. One type of magic is reserved for “extensions” (that’s you!)

There is a fair amount of effort and boilerplate to set up magic on your objects, but consider these benefits:

  • You are guaranteed that only the object your C code created will carry the pointer to your C struct, and no sloppy/buggy Perl-level operations can break that.
  • If the magic-attached pointer isn’t present, you can cleanly die with an error message to the user that somehow they have called your XS method on something that isn’t your object.
  • Your C-function destructor is described by the magic metadata, and does not rely on a DESTROY Perl method. This also makes destruction faster if Perl doesn’t need to call a Perl-level DESTROY function.
  • Magic can be applied equally to any type of ref, so you can use one pattern for whatever you are blessing, or even let the user choose what kind of ref it will be.
  • You can even use Moo or Moose to create the object, then attach your magic to whatever ref the object system created.
  • You get a callback when a new Perl thread starts and attempts to clone your object. (letting you clone it, or throw an exception that it can’t be cloned which is at least nicer to the user than a segfault would be)

With that in mind, lets begin suffering through the details.

Defining Magic

Magic is described with “struct MGVTBL”:

static int
YourProject_LibWhatever_magic_free(pTHX_ SV* sv, MAGIC* mg) {
  LibWhatever_obj *obj= (LibWhatever_obj*) mg->mg_ptr;
  LibWhatever_destroy(obj);
}

#ifdef USE_ITHREADS
static int
YourProject_LibWhatever_magic_dup(pTHX_ MAGIC *mg,
  CLONE_PARAMS *param)
{
  croak("This object cannot be shared between threads");
  return 0;
};
#else
#define YourProject_LibWhatever_magic_dup 0
#endif

// magic table for YourProject::LibWhatever
static MGVTBL YourProject_LibWhatever_magic_vtbl= {
  0, /* get */
  0, /* set */
  0, /* length */
  0, /* clear */
  YourProject_LibWhatever_magic_free, /* free */
#ifdef MGf_COPY
  0, /* copy magic to new variable */
#endif
#ifdef MGf_DUP
  YourProject_LibWhatever_magic_dup /* dup for new threads */
#endif
#ifdef MGf_LOCAL
  ,0 /* local */
#endif
};

You only need one static instance for each type of magic your module creates. It’s just metadata telling Perl how to handle your particular type of extension magic. The ifdefs are from past versions of the struct that had fewer fields, though if your module is requiring Perl 5.8 you can assume ‘copy’ and ‘dup’ exist, and from 5.10 ‘local’ always exists as well.

Next, the recipe to attach it to a new Perl object:

SV * my_wrapper(LibWhatever_obj *cstruct) {
  SV *obj, *objref;
  MAGIC *magic;
  obj= newSV(0); // or newHV() or newAV()
  objref= newRV_noinc(obj);
  sv_bless(objref, gv_stashpv("YourProject::LibWhatever", GV_ADD));
  magic= sv_magicext(
    obj,               // the inner SV/AV/HV, not the ref to it
    NULL,
    PERL_MAGIC_ext,                      // "extension magic"
    &YourProject_LibWhatever_magic_vtbl, // show perl your functions
    (const char*) cstruct,               // your custom pointer
    0);
#ifdef USE_ITHREADS
  magic->mg_flags |= MGf_DUP;
#else
  (void)magic; // suppress warning
#endif
  return objref;
}

The key there is ‘sv_magicext’. Note that you’re applying it to the thing being referred to, not the scalar ref that you use for the call to sv_bless. The messy ifdef part is due to the ‘dup’ field of the magic table only being used when perl was compiled with threading support. The reference to YourProject_LibWhatever_magic_vtbl is both an instruction for Perl to know what functions to call, but also a unique value used to identify your extension magic from anyone else’s.

To read your pointer back from an SV provided to you, the recipe is:

LibWhatever_obj* YourProject_LibWhatever_from_magic(SV *objref) {
  SV *sv;
  MAGIC* magic;

  if (SvROK(objref)) {
    sv= SvRV(objref);
    if (SvMAGICAL(sv)) {
      // Iterate magic attached to this scalar to find our vtable
      for (magic= SvMAGIC(sv); magic; magic = magic->mg_moremagic)
        if (magic->mg_type == PERL_MAGIC_ext
         && magic->mg_virtual == &YourProject_LibWhatever_magic_vtbl)
          // If found, the mg_ptr points to the fields structure.
          return (LibWhatever_obj*) magic->mg_ptr;
    }
  }
  return NULL;
}

This might look a little expensive, but there is likely only one type of magic on your object, so the loop exits on the first iteration, and all you did was “SvROK”, “SvRV”, “SvMAGICAL”, and “SvMAGIC” followed by two comparisons. It’s actually quite a bit faster than verifying the inheritance of the blessed package name.

So there you go - you can now attach your C structs with magic.

In the comments, Leon T. points out that you should really be using mg_findext:

magic= mg_findext(sv, PERL_MAGIC_ext, &YourProject_LibWhatever_magic_vtbl);
if (magic)
  return (LibWhatever_obj*) magic->mg_ptr;

He’s right, you should… but iterating the linked list without a function call will be a tiny bit faster. :-)

Convenience via Typemap

In a typical wrapper around a C library, you’re going to be writing a lot of methods that need to call YourProject_LibWhatever_from_magic on the first argument. To make that easier, lets move this decoding step to the typemap.

Without a typemap:

IV
method1(self, param1)
  SV *self
  IV param1
  INIT:
    LibWhatever_obj *obj= YouProject_LibWhatever_from_magic(self);
  CODE:
    if (!obj) croak("Not an instance of LibWhatever");
    RETVAL= LibWhatever_method1(obj, param1);
  OUTPUT:
    RETVAL

With a typemap entry like:

TYPEMAP
LibWhatever_obj*        O_LibWhatever_obj

INPUT
O_LibWhatever_obj
  $var= YourProject_LibWhatever_from_magic($arg);
  if (!$var) croak("Not an instance of LibWhatever");

the XS method becomes

IV
method1(obj, param1)
  LibWhatever_obj *obj
  IV param1
  CODE:
    RETVAL= LibWhatever_method1(obj, param1);
  OUTPUT:
    RETVAL

If you have some functions that take an optional LibWhatever_obj pointer, try this trick:

typedef LibWhatever_obj Maybe_LibWhatever_obj;

...

void
show(obj)
  Maybe_LibWhatever_obj *obj
  PPCODE:
    if (obj) {
      printf("...", LibWhatever_get_attr1(obj));
    }
    else {
      printf("NULL");
    }

TYPEMAP
LibWhatever_obj*        O_LibWhatever_obj
Maybe_LibWhatever_obj*  O_Maybe_LibWhatever_obj

INPUT
O_LibWhatever_obj
  $var= YourProject_LibWhatever_from_magic($arg);
  if (!$var) croak("Not an instance of LibWhatever");

INPUT
O_Maybe_LibWhatever_obj
  $var= YourProject_LibWhatever_from_magic($arg);

If you want to save a bit of compiled .so file size, you can move the error message into the ‘from_magic’ function, with a flag:

#define OR_DIE 1

LibWhatever_obj*
YourProject_LibWhatever_from_magic(SV *objref, int flags) {
  SV *sv;
  MAGIC* magic;

  if (SvROK(objref)) {
    sv= SvRV(objref);
    if (SvMAGICAL(sv)) {
      // Iterate magic attached to this scalar to find our vtable
      for (magic= SvMAGIC(sv); magic; magic = magic->mg_moremagic)
        if (magic->mg_type == PERL_MAGIC_ext
         && magic->mg_virtual == &YourProject_LibWhatever_magic_vtbl)
          // If found, the mg_ptr points to the fields structure.
          return (LibWhatever_obj*) magic->mg_ptr;
    }
  }
  if (flags & OR_DIE)
    croak("Not an instance of LibWhatever");
  return NULL;
}

TYPEMAP
LibWhatever_obj*        O_LibWhatever_obj
Maybe_LibWhatever_obj*  O_Maybe_LibWhatever_obj

INPUT
O_LibWhatever_obj
  $var= YourProject_LibWhatever_from_magic($arg, OR_DIE);

INPUT
O_Maybe_LibWhatever_obj
  $var= YourProject_LibWhatever_from_magic($arg, 0);

You can play further games with this, like automatically initializing the SV to become one of your blessed objects if it wasn’t defined, in the style of Perl’s open my $fh, ..., or maybe an option to add the magic to an existing object created by a pure-perl constructor. Do whatever makes sense for your API.

More Than One Pointer

In all the examples so far, I’m storing a single pointer to a type defined in the external C library being wrapped. Chances are, though, you need to store more than just that one pointer.

Imagine a poorly-written C library where you need to call SomeLib_create to get the object, then a series of SomeLib_setup calls before any other function can be used, then if you want to call SomeLib_go you have to first call SomeLib_prepare or else it segfaults. You could track these states in Perl variables in a hash ref, but it would just be easier if they were all present in a local C struct of your creation.

So, rather than attaching a pointer to the library struct with magic, you can attach your own allocated struct, and your struct can have a pointer to all the library details. For extra convenience, your struct can also have a pointer to the Perl object which it is attached to, which lets you access that object from other methods you write which won’t have access to the Perl stack.

struct YourProject_objinfo {
  SomeLib_obj *obj;
  HV *wrapper;
  bool started_setup, finished_setup;
  bool did_prepare;
};

struct YourProject_objinfo*
YourProject_objinfo_create(HV *wrapper) {
  struct YourProject_objinfo *objinfo= NULL;
  Newxz(objinfo, 1, struct YourProject_objinfo);
  objinfo->wrapper= wrapper;
  /* other setup here ... */
  return objinfo;
}

void
YourProject_objinfo_free(struct YourProject_objinfo *objinfo) {
  if (objinfo->obj) {
    SomeLib_obj_destroy(objinfo->obj);
  }
  /* other cleanup here ... */
  Safefree(objinfo);
}

static int YourProject_objinfo_magic_free(pTHX_ SV* sv, MAGIC* mg) {
  YourProject_objinfo_free(
    (struct YourProject_objinfo *) mg->mg_ptr);
}

One other thing that has changed from the previous scenario is that you can allocate this struct and attach it to the object whenever you want, instead of waiting for the user to call the function that creates the instance of SomeLib_obj. This gives you more flexible ways to deal with creation of the magic.

Here’s a pattern I like:

#define OR_DIE 1
#define AUTOCREATE 2

struct YourProject_objinfo*
YourProject_objinfo_from_magic(SV *objref, int flags) {
  SV *sv;
  MAGIC* magic;

  if (!sv_isobject(objref))
    /* could also check 'sv_derived_from' here, but that's slow */
    croak("Not an instance of YourProject");

  sv= SvRV(objref);
  if (SvMAGICAL(sv)) {
    /* Iterate magic attached to this scalar to find our vtable */
    for (magic= SvMAGIC(sv); magic; magic = magic->mg_moremagic)
      if (magic->mg_type == PERL_MAGIC_ext
       && magic->mg_virtual == &YourProject_objinfo_magic_vtbl)
        /* If found, the mg_ptr points to the fields structure. */
        return (struct YourProject_objinfo*) magic->mg_ptr;
  }
  if (flags & AUTOCREATE) {
    struct YourProject_objinfo *ret;
    if (SvTYPE(sv) != SVt_PVHV)
      croak("Expected blessed hashref");
    ret= YourProject_objinfo_create((HV*)sv);
    magic= sv_magicext(sv, NULL, PERL_MAGIC_ext,
      &YourProject_objinfo_magic_vtbl, (const char*) ret, 0);
#ifdef USE_ITHREADS
    magic->mg_flags |= MGf_DUP;
#else
    (void)magic; // suppress warning
#endif
    return ret;
  }
  if (flags & OR_DIE)
    croak("Not an initialized instance of YourProject");
  return NULL;
}

typedef struct YourProject_objinfo Maybe_YourProject_objinfo;
typedef struct YourProject_objinfo Auto_YourProject_objinfo;

Then in the typemap:

TYPEMAP
struct YourProject_objinfo*  O_YourProject_objinfo
Maybe_YourProject_objinfo*   O_Maybe_YourProject_objinfo
Auto_YourProject_objinfo*    O_Auto_YourProject_objinfo

INPUT
O_YourProject_objinfo
  $var= YourProject_objinfo_from_magic($arg, OR_DIE);

INPUT
O_Maybe_YourProject_objinfo
  $var= YourProject_objinfo_from_magic($arg, 0);

INPUT
O_Auto_YourProject_objinfo
  $var= YourProject_objinfo_from_magic($arg, AUTOCREATE);

(I should note here that you don’t need a new typemap macro for each additional type, you can use the ‘$type’ variable (holding the C type being converted) to create generic rules for multiple types. See next article for an example.)

Then use it in your XS methods to conveniently implement your sanity checks for this annoying C library:

# This is called by the pure-perl constructor, after blessing the hashref
void
_init(objinfo, param1, param2)
  Auto_YourProject_objinfo* objinfo
  IV param1
  IV param2
  PPCODE:
    if (objinfo->obj)
      croak("Already initialized");
    objinfo->obj= SomeLib_create(param1, param2);
    if (!objinfo->obj)
      croak("SomeLib_create failed: %s", SomeLib_get_last_error());
    XSRETURN(0);

bool
_is_initialized(objinfo)
  Maybe_YourProject_objinfo* objinfo
  CODE:
    RETVAL= objinfo != NULL && objinfo->obj != NULL;
  OUTPUT:
    RETVAL

void
setup(objinfo, key, val)
  struct YourProject_objinfo* objinfo
  const char *key
  const char *val
  PPCODE:
    if (objinfo->finished_setup)
      croak("Cannot call 'setup' after 'prepare'");
    if (!SomeLib_setup(objinfo->obj, key, val))
      croak("SomeLib_setup failed: %s", SomeLib_get_last_error());
    objinfo->setup_started= true;
    XSRETURN(0);

void
prepare(objinfo)
  struct YourProject_objinfo* objinfo
  PPCODE:
    if (!objinfo->started_setup)
      croak("Must call setup at least once before 'prepare'");
    objinfo->finished_setup= true;
    if (!SomeLib_prepare(objinfo->obj))
      croak("SomeLib_prepare failed: %s", SomeLib_get_last_error());
    objinfo->did_prepare= true;
    XSRETURN(0);

void
somelib_go(objinfo)
  struct YourProject_objinfo* objinfo
  PPCODE:
    if (!objinfo->did_prepare)
      croak("Must call 'prepare' before 'go'");
    if (!SomeLib_go(objinfo->obj))
      croak("SomeLib_go failed: %s", SomeLib_get_last_error());
    XSRETURN(0);

Like how clean the XS methods got?

Conclusion

When you use the pattern above, your module becomes almost foolproof against misuse. You provide helpful errors for the Perl coder to guide them toward correct usage of the library with easy-to-understand errors (well, depending on how much effort you spend on that) and they don’t have to pull their hair out trying to log all the API calls and compare to the C library documentation to figure out which one happened in the wrong order resulting in a mysterious crash.

The code above is all assuming that the C library is providing objects whose lifespan you are in control of. Many times, the objects from a C library will have some other lifespan that the user can’t directly control with the Perl objects. I’ll cover some techniques for dealing with that in the next article.

The Quickest Way to Set Up HTTPS

I registered on blogs.perl.org today so that I could comment on posts about object systems. However, the very first thing I encountered was a password page with NO SSL. So, even though I have a ton to say about object systems, my first blog post will instead be about setting up SSL.

(I’m aware that this is a “legacy server problem” but I also recently learned that it doesn’t matter with traefik.)

In this grand year of 2021 you can add SSL to any site, on any architecture, for free, by adding 3 files to your server, making one small config change to Apache, and running a service. We are truly living in the future.

traefik

is the first file. It comes from https://github.com/traefik/traefik/releases, and there is one for any architecture, for instance:

The archive contains one binary, named ‘traefik’. It is a universal Linux static binary and does not depend on any library in the host system. Traefik is a reverse proxy, with lots of good defaults, and lots of features, most of which this guide is ignoring. The feature that we are going for is the automatic LetsEncrypt support built into traefik.

Put this file somewhere like /usr/local/bin/traefik

wget https://github.com/traefik/traefik/releases/download/v2.5.4/traefik_v2.5.4_linux_amd64.tar.gz
tar -xzf traefik_v2.5.4_linux_amd64.tar.gz
mkdir -p /usr/local/bin
mv traefik /usr/local/bin/

traefik.toml

is the second file. You can actually configure traefik with yaml or json as well, but I happen to have .toml files on hand, and toml is a little less likely to get whitespace-dammaged during copy/paste.


[entryPoints.http]
  address = ":80"
[entryPoints.https]
  address = ":443"
[entryPoints.traefik]
  address = "localhost:9999"
#[api]
#  insecure = true
#  dashboard = true
#  debug = true
[providers.file]
  directory = "/etc/traefik/conf"
  watch = true
[certificatesResolvers.le.acme]
  email = "hostmaster@perl.org"
  storage = "/etc/traefik/acme.json"
  caServer = "https://acme-v02.api.letsencrypt.org/directory"
  #caServer = "https://acme-staging-v02.api.letsencrypt.org/directory"
[certificatesResolvers.le.acme.httpChallenge]
  entryPoint = "http"

Put this at /etc/traefik/traefik.toml

blogs.perl.org.toml

is the third file. This describes how traefik should proxy your back-end service. It goes into a different config file because it is part of the “dynamic config” rather than the “static config”. You can update any of the files in the dynamic config on the fly and traefik will pick up the changes automatically without any signaling or restart.


#[http.middlewares.https_redirect.redirectScheme]
#  scheme = "https"

[http.routers.blogs]
  entryPoints = ["http"]
#  middlewares = ["https_redirect"]
  service = "blogs"
  rule = "Host(`blogs.perl.org`)"

[http.routers.blogstls]
  entryPoints = ["https"]
  service = "blogs"
  rule = "Host(`blogs.perl.org`)"
  [http.routers.blogstls.tls]
    certResolver = "le"
    [[http.routers.blogstls.tls.domains]]
      main = "blogs.perl.org"

[http.services.blogs.loadBalancer]
  passHostHeader = true
  [[http.services.blogs.loadBalancer.servers]]
    url = "http://localhost:8080/"

Put this at /etc/traefik/conf/blogs.perl.org.toml

Apache Config Change

Next, you need apache to listen on a different port than 80. Why does traefik need 80? because the LetsEncrypt registration requires challenges to be found at port 80, and Traefik is automatically creating these responses. The configs above assume apache is moved to port 8080 on localhost.

Apache configurations vary widely per Linux distribution, so I can’t really guess at the location of these files, but if you search for

egrep -Ri '(listen|virtualhost).*80' /etc/apache*

you should see it. Simply change all occurrences of :80 to localhost:8080 and restart apache.

Running Traefik

This is another varies-by-distro situation. You want to run Traefik as a service at startup.

With SysV init, this means creating an init script. Traefik does not provide one (as traefik is typically run inside docker) but github user yaxin-cn shared one.

With systemd, you need a service file. The traefik project provides one. There is also a nice write-up of the steps by github user ubergesundheit.

Debugging

If all goes as planned, you should suddenly be able to access https://blogs.perl.org, and “just work”. Since that seldom happens, you’ll see above in traefik.toml where I commented out the “[api]” keys. Un-comment those, and now you can access traefik’s dashboard on localhost:9999. To reach that, you likely need an ssh tunnel:

ssh -L9999:localhost:9999 blogs.perl.org

Now you can browse to localhost:9999 and see traefik’s interpretation of the live state of your config files. Tinker with the configs until all errors are resolved.

As a final consideration, there is a commented-out middleware in blogs.perl.org.toml, which redirects http to https. Once you have https fully working, you can un-comment that to push everyone over to SSL. You might decide not to do that for some reason, but keep in mind that Google gives a boost to sites that force SSL, making the blogs more visible.

Thanks for reading! And thanks for hosting a community forum!

About Nerdvana

user-pic I like code, and code that writes code, and code that writes code that writes code. So I especially like Perl.