Learning XS - Closures
Over the past year, I’ve been self-studying XS and have now decided to share my learning journey through a series of blog posts. This tenth post introduces you to what I call closures in XS.
Firstly lets start with the basics of programming and Scope?
Scope refers to the visibility and lifetime of variables and functions in a program. In Perl, scope is determined by where variables are declared and how they are used. Subroutines have access only to variables within their own scope, and this depends on where those variables are declared. In contrast, C is a statically typed language and handles scope differently, variables and functions have fixed types and their visibility is determined by their declaration context, such as within a function, file, or globally. This difference in how scope is managed makes certain Perl features, like closures, more challenging to implement directly in C.
Closures?
In Perl, a closure is an anonymous subroutine that captures variables from its surrounding lexical scope. This means the subroutine "remembers" the values of those variables even after the outer scope has finished executing. A simple example of a closure in perl is the following:
sub closure {
my ($self) = @_;
return sub {
my ($x) = @_;
return $self + $x;
};
}
Now, as mentioned, C works differently closures aren’t built into the language. While you can use libraries like 'Block' or roll your own closure like mechanism in C, it's difficult. They are also not built into XS however during my learning journey I’ve discovered a straightforward approach to achieving closure like behaviour, which I’ll now share in this post.
Today we are going to reimplement the following perl code:
package Pet::Cat;
use parent 'Exporter';
our @EXPORT = qw/cat/;
sub cat {
my ($name, $hunger, $happiness) = @_;
$hunger ||= 5;
$happiness ||= 5;
return sub {
my $action = @_;
if ($action) {
if ($action eq 'feed') {
$hunger = $hunger > 0 ? $hunger - 1 : 0;
return "$name purrs contentedly. Hunger is now $hunger.";
} elsif ($action eq 'play') {
$happiness++;
return "$name chases a laser pointer! Happiness is now $happiness.";
} else {
$happiness--;
$hunger++;
}
}
return "$name curls up and waits for attention.";
}
}
1;
You would then call this module like:
use Pet::Cat;
my $simba = cat("Simba");
say $simba->('feed'); # Simba purrs contentedly. Hunger is now 4
say $simba->('play'); # Simba chases a laser pointer! Happiness is now 6
say $simba->('fetch'); # Simba curls up and waits for attention
Today so I can try to demonstrate how to achieve the above in XS I will implement the relevant XSUBs in multiple steps. First lets create a new distribution using module starter:
module-starter --module="Pet::Cat" --author="Your Name" --email="your email"
Update the Makefile.PL to include XSMULTI then open lib/Pet/Cat.pm and fix the boilerplat code:
package Pet::Cat;
use 5.006;
use strict;
use warnings;
our $VERSION = '0.01';
require XSLoader;
XSLoader::load("Pet::Cat", $VERSION);
1;
Now create the XS file lib/Pet/Cat.xs and add the following code:
#define PERL_NO_GET_CONTEXT
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
MODULE = Pet::Cat PACKAGE = Pet::Cat
PROTOTYPES: DISABLE
We will start today by implementing a very simple closure that simply returns the passed in name of our cat first lets add a temporary test we will later remove:
use Test::More;
use Pet::Cat;
my $simba = Pet::Cat::cat("Simba");
is(ref($simba), 'CODE', 'cat returns a code reference');
is($simba->(), 'Simba', 'TEMPORARY TEST: cat closure returns the cat name');
done_testing();
To implement this we will need to create two new XSUbs in lib/Pet/Cat.xs, one the entry point 'cat' which returns the second 'cat_closure' as a 'anonymous subroutine' or 'closure'. The code to pass the test is as follows:
SV *
cat_closure(...)
CODE:
SV * self = CvXSUBANY(cv).any_ptr;
RETVAL = self;
OUTPUT:
RETVAL
CV *
cat(name, ...)
SV * name
CODE:
CV * kitty = newXS(NULL, XS_Pet__Cat_cat_closure, __FILE__);
CvXSUBANY(kitty).any_ptr = (void *)name;
SvREFCNT_inc(kitty);
RETVAL = kitty;
OUTPUT:
RETVAL
To explain this code above we have two XSUBs called 'cat_closure' and 'cat', cat creates an anonymous code reference using newXS, which as mentioned in previous posts is a way to create a new XSUB in XS, by passing the first argument as NULL this makes it anonymous without a fixed namespace. We then use 'CvXSUBANY' which is a macro that allow us to access the XSUBANY union associated with the CV, all complicated stuff... then we use 'any_ptr' which is used to store an arbitrary C pointer inside an XSUB’s context
.
Now we can run our test and it should pass:
perl Makefile.PL
make
make test
Next we will need to modify what we store in 'any_pty' as we have to keep track of additional parameters, the hunger and happiness of our cat. We will do this by using a HV, update the 'new' XSUB to the following:
CV *
cat(name, ...)
SV * name
CODE:
CV * kitty = newXS(NULL, XS_Pet__Cat_cat_closure, __FILE__);
HV * stash = newHV();
hv_store(stash, "name", 4, newSVsv(name), 0);
hv_store(stash, "happiness", 9, items > 1 ? newSVsv(ST(1)) : newSViv(5), 0);
hv_store(stash, "hunger", 6, items > 2 ? newSVsv(ST(2)) : newSViv(5), 0);
CvXSUBANY(kitty).any_ptr = (void *)stash;
SvREFCNT_inc(kitty);
RETVAL = kitty;
OUTPUT:
RETVAL
Then modify the 'cat_closure' XSUB to retrieve the values from the HV.
SV *
cat_closure(...)
CODE:
HV * self = CvXSUBANY(cv).any_ptr;
SV * name = *hv_fetch(self, "name", 4, 0);
RETVAL = name;
OUTPUT:
RETVAL
Now we can run our test again and it should still pass. Next lets remove the temporary test and add the real functionality to our closure. We will modify the 'cat_closure' XSUB to handle the actions of feeding and playing.
is($simba->('feed'), 'Simba purrs contentedly. Hunger is now 4');
is($simba->('play'), 'Simba chases a laser pointer! Happiness is now 6');
is($simba->('fetch'), 'Simba curls up and waits for attention');
To implement this we will modify the 'cat_closure' XSUB to handle the actions:
SV *
cat_closure(...)
CODE:
STRLEN retlen;
HV * self = CvXSUBANY(cv).any_ptr;
SV * name = *hv_fetch(self, "name", 4, 0);
SV * happiness = *hv_fetch(self, "happiness", 9, 0);
SV * hunger = *hv_fetch(self, "hunger", 6, 0);
RETVAL = newSVpvf("%s curls up and waits for attention", SvPV(name, retlen));
if (items > 0 && SvOK(ST(0)) && SvTYPE(ST(0)) == SVt_PV) {
const char *action = SvPV(ST(0), retlen);
if (strcmp(action, "feed") == 0) {
SvREFCNT_dec(RETVAL);
int hunger_val = SvIV(hunger);
if (hunger_val > 0) hunger_val--;
sv_setiv(hunger, hunger_val);
RETVAL = newSVpvf("%s purrs contentedly. Hunger is now %d", SvPV(name, retlen), hunger_val);
} else if (strcmp(action, "play") == 0) {
SvREFCNT_dec(RETVAL);
int happiness_val = SvIV(happiness);
sv_setiv(happiness, ++happiness_val);
RETVAL = newSVpvf("%s chases a laser pointer! Happiness is now %d", SvPV(name, retlen), happiness_val);
} else {
int happiness_val = SvIV(happiness);
int hunger_val = SvIV(hunger);
sv_setiv(happiness, --happiness_val);
sv_setiv(hunger, ++hunger_val);
}
}
OUTPUT:
RETVAL
This code retrieves the name, happiness, and hunger values from the HV, then checks if an action is provided. If the action is 'feed', it decreases hunger and returns a message. If the action is 'play', it increases happiness and returns a message. If an unrecognized action is provided, it decreases happiness and increases hunger, returning a default message. If undef is passed or any param not a string it just returns the default message. The only new function you may have not seen in the previous posts is 'sv_setiv' we use it to update the integer internal value of the SVs.
Now make test and your tests should pass.
Finally to complete our module we need to add an import XSUB to allow the use of the 'cat' function in a more Perl-like way. Add the following code to lib/Pet/Cat.xs:
void
import(...)
CODE:
char *pkg = HvNAME((HV*)CopSTASH(PL_curcop));
int name_len = strlen(pkg) + 6;
char *name = (char *)malloc(name_len);
snprintf(name, name_len, "%s::cat", pkg);
newXS(name, XS_Pet__Cat_cat, __FILE__);
And that completes our XS module. Today we have learned how to implement closure-like behaviour in XS by using a hash to store the state of our cat, allowing us to maintain the values of hunger and happiness across calls. This approach mimics the closure functionality found in Perl, enabling us to create a simple yet effective way to manage scope in XS.
I hope you found this post helpful. If you have any questions or suggestions for future posts, feel free to reach out.
Another interesting post.
where is this HV released? It won't be released when the CV is destroyed unless you also
CvREFCOUNTED_ANYSV_on(cv)
which requires 5.38 (and technically, storing in any_sv instead of any_ptr).Nothing releases
name
here. You might find the Perl_form() API useful here.I've noticed in your XS posts you most just use raw SVs, have you used typemaps at all?
Is this the correct way to free the HV?
```
void
euthanase(kitty)
CV * kitty
CODE:
SvREFCNT_dec(CvXSUBANY(kitty).any_ptr);
```
That should work, you might want to set any_ptr to NULL.
An option to automatically manage the lifetime of your
self
object would be to store it in MAGIC. See https://metacpan.org/release/PEVANS/Future-XS-0.13/source/hax/cv_set_anysv_refcounted.c.inc for an example for code that uses CvREFCOUNTED_ANYSV_on() when available and magic when it isn't (thanks toddr who brought this up in IRC)I agree with Tony, using typemaps can make handling C datatypes much easier.
There are SV*/CV* memory and refcnt leaks in this code, see https://www.nntp.perl.org/group/perl.perl5.porters/2025/06/msg270057.html
"XSANY" macro, which is better written as "CvXSUBANY(kitty).any_ptr" or "CvXSUBANY(kitty)" is the correct way to attach and store random private data to a CV*, aslong as the private data's type is <= sizeof(void*). A example of what can not be stuffed inside CvXSUBANY(), is a type NV or a type double, since those are min 8/10/12/16 max bytes long depending various interp build config options. And remember not everyone has a 64 bit CPU yet. The actual things "any_*" union members that can be stuffed in that field are from this struct.
But remember, the Perl interp/API, sees the CvXSUBANY() field as a plain old dumb integer, and will never deref it, or dtor that "dumb integer" that was put there by the end user's XS module. The XS author must dtor the opaque ptr sized integer inside CvXSUBANY(), themselves, somehow, some way. Most often ways of doing this is to add SvRMG/random magic, to the CV*, with a svt_free() and a svt_dup() vtable method.
The svt_dup() C89/C99 method must be there if its a PERL_IMPLICIT_CONTEXT/PERL_MULTIPLICITY enable perl, since a SV*/CV*/HV*/AV*/GV* ptr (unique addr) from 1 ithread, can never ever ever, appear in a 2nd ithread's "C level" "view of the world" for any reason. Either the svt_dup() method wipes the ptr (and whatever mental model obj/resource it represents) from the new ithread's VM state, or somehow "clones it", "forks it", "mutex lock shares it", "serialized read/writes to it" so it appears on a PP level in the 2nd ithread, but I and the Perl API guarantee that "resource" has a different SV* addr in ithread #2, than it does in ithread #1.
Other 2nd strategy is to bless the CV* or SV*, (a temporary or permanent outer SV* ROK/SVRV required for the bless). Later on a PP sub DESTROY {} or an XS sub DESTROY {} will wipe and free whatever opaque resource that void * inside CvXSUBANY() represents.
Me personally, I'm not a fan of flag "CvREFCOUNTED_ANYSV_on(cv)", yes it works, but that flag is unusable for a CPAN distributed module because it still has new car smell and the fenders and doors of the car still have anti scratch shrink wrap on them.
"requires >= 5.38" means the feature is useless for CPAN, and only usable for private commercial/business/employer XS code.
Nitpicks:
never hand count a string with your finger or cursor unless there is no other tool in the world available
Perl API has this call for "" string literals that auto calcs the length for you of your string literal
Next:
Use type "const char *" and "SvPV_const()" macro where-ever possible if nothing is supposed to write to the returned buffer. Don't ask why, complicated story.
Also macros "SvPV_nolen(sv)" and "SvPV_nolen_const(sv)" exist. Now you don't need to waste 4/8 bytes of space on the C stack, for that Size_t var you will never use.
If you have a SV*, and you are calling any of Perl's "***pvf()" functions, DO NOT use "%s" and "SvPV****()". Perl's "****pvf()" calls ARE NOT #define macros or thin wrappers to your OS's libc.so.
Perl's "****pvf()" calls implement their own unique "printf()" engine from scratch, that has nothing to do with your OS's libc.
Pass that SV* directly to newSVpvf() with the correct flag. Don't unwrap it to a null terminated C string for no good reason.
Some code examples, that correctly uses Perl's "printf()" engine:
croak("Unrecognized signal name \"%" SVf "\"", SVfARG(*mark));
if (parno != logical_parno) sv_catpvf(sv, "/%" UVuf.(UV)parno);
SV **name= av_fetch_simple(name_list, parno, 0 );
if (name) sv_catpvf(sv, " '%" SVf "'", SVfARG(*name));
if (namok && argok) {
SV* sv = Perl_newSVpvf(aTHX_ "%" SVf "(%" SVf ")", SVfARG(*namsvp), SVfARG(*argsvp));
PUSHs(sv_2mortal(sv));
}
Next:
Never write "ST(0)" or "ST(1)" or "ST(1234)" more than once. "ST(1234)" is a very complicated macro internally, it does not expand to:
sum_perl_array_ptr_in_a_c_auto_var[1234] = ;
It shouldve looked like this
if (items > 0 && ((sv = ST(0)), SvOK(sv)) && SvTYPE(sv) == SVt_PV) {
const char *action = SvPV(sv, retlen);
But ^^^^ above still is wrong/bizarre.
"SvTYPE(sv) == SVt_PV" means absolutely nothing, if you want to know, is this SV* a string. The only purpose of a "SvTYPE(sv) == SVt_PV" style check, is protection against macros SvLEN() and SvCUR() SEGVing on your, since you received a SV* without any SV body struct (SVt_NULL), or the SV body struct is so tiny, C struct fields SvLEN() and SvCUR() don't exist, or you are reading uninited memory, or you are reading the SvCUR() value of a totally different random SV*.
"SvTYPE(sv) == SVt_PV" also will insta-break and act weird. The safety test for being able to safely read SvLEN() and SvCUR(), is "SvTYPE(sv) >= SVt_PV", because you could have a SVt_PVIV SVt_PVNV SVt_PVMG SVt_PVLV, all of which can very nicely be holding a SvPV()/SvPOK() string for you.
Next:
Violates SV Get Magic rules.
It should look like
"SvPV_nomg_const()" with its "nomg" suffix, is because you don't want to fire that Pure Perl tie(TIESCALAR), or XS/C-level, GETMAGIC getter method 2x in a row, if that getter method actually exists. Waste of CPU time, and academically illogical to whatever package/module/PP class/sub, created that tied scalar, and passed it to you, for it to get its sub FETCH {} getter method called 2x in a row, on "1 line of code". Its more of a waste of CPU for that getter method to execute 2x, rather than some SW tech/IT/engineering argument of "its dangerous" or "may SEGV" or "undefined behavior".
Next:
why was "RETVAL = newSVpvf("%s curls up and waits for attention", SvPV(name, retlen));" created, if 2 out of 3, or 2 out 4, control flow paths, don't want to use that SVPV* , and immediately dtor it and create a SVPV* with useful content?
Next:Why fetch SV * happiness and SV * hunger, " if(items == 0) "? b/c your code for " if(items == 0) ", only wants hash key "SV * name = *hv_fetch(self, "name", 4, 0);", to create the RETVAL. SV * happiness and SV * hunger never get used on the " if(items == 0) " branch/control flow path.
Next:A type "IV" != to ISO C type "int". An IV can be 32 bit, 64 bits, and with a GCC/Clang nightly/alpha build, 128 bits. Do not mix Perl API types, and ISO C types unless you know what you are doing. This code as written, is a truncation, and sign extend minefield of bugs.
Also, to decrease variable liveness, and smaller machine code, don't make the C compiler, save var "int hunger_val" to C stack memory, to preserve it around the " sv_setiv(happiness, --happiness_val);" function call.
I would've written that block as the follow, (note C99 code, not C89 compliant): Or in C89-ese:Next:
Depending on how you, or any author wants to design their module, this is personal preference, but my opinion is
should be, or add a check right before "if (items > 0" that looks like this:
You don't want some user passing arg #2 or arg #3 thinking arg #2 or #3 will do something, when arg #2 and #3 won't do anything in real life and are a bug to write in the first place.