Learning XS - Overloading
Over the past year, I’ve been self-studying XS and have now decided to share my learning journey through a series of blog posts. This fourth post introduces you to overloading operators in XS.
Firstly, what is an operator in Perl?
An operator in Perl is a special symbol or keyword that performs an action on one or more values, called operands. For example, '+' is an addition operator, and '*' is a multiplication operator. Operators allow you to perform calculations, comparisons, and other operations in your code.
So... overloading?
In Perl, you can overload operators for your own classes. This means that when you use an operator like '+', '-', or '*' with objects of your class, you can define how those operators behave by extending or rewriting their functionality.
Perl allows you to overload many operators, some of which include:
Operator Symbol | Description |
---|---|
'+' | Addition |
'-' | Subtraction |
'*' | Multiplication |
'/' | Division |
'%' | Modulus |
'**' | Exponentiation |
'<<' | Left bitshift |
'>>' | Right bitshift |
'x' | String/array repetition |
'.' | String concatenation |
'<' | Numeric less than |
'<=' | Numeric less or equal |
'>' | Numeric greater than |
'>=' | Numeric greater or equal |
'==' | Numeric equality |
'!=' | Numeric inequality |
'<=>' | Numeric comparison |
'lt' | String less than |
'le' | String less or equal |
'gt' | String greater than |
'ge' | String greater or equal |
'eq' | String equality |
'ne' | String inequality |
'cmp' | String comparison |
'bool' | Boolean context |
'""' | String context |
'0+' | Numeric context |
'++' | Increment |
'--' | Decrement |
'abs' | Absolute value |
'neg' | Negation |
'not' | Logical not |
'~' | Bitwise not |
'atan2' | Arctangent |
'cos' | Cosine |
'sin' | Sine |
'exp' | Exponential |
'log' | Logarithm |
'sqrt' | Square root |
'${}' | Dereference as scalar |
'@{}' | Dereference as array |
'%{}' | Dereference as hash |
'&{}' | Dereference as code |
'*{}' | Dereference as glob |
'fallback' | Fallback for unknown ops |
Now, XS also supports overloading of operators, allowing you to implement custom behaviour for Perl operators at the C level. This enables you to write high-performance classes in XS that behave just like native Perl objects when used with operators.
In today's example, let's journey back to ancient Rome and explore how numbers were represented by creating an object that transparently handles math with Roman numerals. We will be porting the following perl code:
package Roman::Numeral;
use strict;
use warnings;
use Carp;
use Scalar::Util qw(looks_like_number);
use overload
'+' => 'add',
'-' => 'subtract',
'*' => 'multiply',
'/' => 'divide',
'==' => 'num_eq',
'!=' => 'num_ne',
'<' => 'num_lt',
'>' => 'num_gt',
'<=' => 'num_le',
'>=' => 'num_ge',
'""' => 'as_string',
fallback => 1;
sub new {
my ($class, $value) = @_;
my $num = _normalize($value);
croak "Value must be a positive integer or valid Roman numeral" unless defined $num && $num > 0;
bless { value => $num }, $class;
}
sub as_number {
my ($self) = @_;
return $self->{value};
}
sub as_string {
my ($self) = @_;
return _to_roman($self->{value});
}
sub add {
my ($self, $other, $swap) = @_;
my $sum = $self->{value} + _normalize($other);
return Roman::Numeral->new($sum);
}
sub subtract {
my ($self, $other, $swap) = @_;
my $diff = $self->{value} - _normalize($other);
croak "Result must be positive" if $diff < 0;
return Roman::Numeral->new($diff);
}
sub multiply {
my ($self, $other, $swap) = @_;
my $prod = $self->{value} * _normalize($other);
return Roman::Numeral->new($prod);
}
sub divide {
my ($self, $other, $swap) = @_;
my $div = int($self->{value} / _normalize($other));
return Roman::Numeral->new($div);
}
sub num_eq { $_[0]->{value} == _normalize($_[1]) }
sub num_ne { $_[0]->{value} != _normalize($_[1]) }
sub num_lt { $_[0]->{value} < _normalize($_[1]) }
sub num_gt { $_[0]->{value} > _normalize($_[1]) }
sub num_le { $_[0]->{value} <= _normalize($_[1]) }
sub num_ge { $_[0]->{value} >= _normalize($_[1]) }
sub _normalize {
my ($v) = @_;
if (ref $v) {
return $v->{value} if ref($v) eq 'Roman::Numeral';
die "Invalid parameter";
}
return $v if looks_like_number($v) && $v >= 0;
return _from_roman($v);
}
sub _to_roman {
my ($n) = @_;
my @letters = (
[1000, "M"], [900, "CM"], [500, "D"], [400, "CD"],
[100, "C"], [90, "XC"], [50, "L"], [40, "XL"],
[10, "X"], [9, "IX"], [5, "V"], [4, "IV"], [1, "I"]
);
my $roman = "";
for my $pair (@letters) {
my ($val, $sym) = @$pair;
while ($n >= $val) {
$roman .= $sym;
$n -= $val;
}
}
return $roman;
}
sub _from_roman {
my ($s) = @_;
my %vals = (
M => 1000, D => 500, C => 100, L => 50,
X => 10, V => 5, I => 1
);
$s = uc($s // "");
die "Invalid Roman numeral" unless $s =~ /^[MDCLXVI]+$/;
my $total = 0;
my $prev = 0;
for my $c (split //, reverse $s) {
my $val = $vals{$c};
if ($val < $prev) {
$total -= $val;
} else {
$total += $val;
}
$prev = $val;
}
return $total;
}
You can then call it like this:
my $a = Roman::Numeral->new('XIV'); # 14
my $b = Roman::Numeral->new(6); # VI
print $a + $b; # XX (20)
Let’s start by creating a new distribution 'Roman::Numeral'. We again will use 'Module::Starter' to create the distribution structure:
module-starter --module="Roman::Numeral" --author="Your Name" --email="your email"
This will create a directory structure similar to the previous posts, with a 'lib' directory containing the 'Roman/Numeral.pm' file. Ensure you have updated the 'Makefile.PL' to include XSMULTI and then edit the 'Roman/Numeral.pm' file to include the following.
package Roman::Numeral;
use 5.006;
use strict;
use warnings;
our $VERSION = '0.01';
require XSLoader;
XSLoader::load('Roman::Numeral', $VERSION);
1;
Now create the XS file 'Roman/Numeral.xs' in the same directory with the following boilerplate code:
#define PERL_NO_GET_CONTEXT
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
MODULE = Roman::Numeral PACKAGE = Roman::Numeral
PROTOTYPES: DISABLE
FALLBACK: TRUE
Okay so now we have the basic structure set up, and our make, make test, works. The only difference between this and the other posts is the 'FALLBACK: TRUE' which like when overloading in perl ensures any unimplemented operator not overloaded on our package will fallback to its defaults. With testing today our module is slightly more sophisticated so instead of a single test file lets use two, first lets test basic functionality, so create a file 't/01-basic.t' with the following code:
use Test::More;
use Roman::Numeral;
my %map = (
1 => "I",
2 => "II",
3 => "III",
4 => "IV",
5 => "V",
6 => "VI",
7 => "VII",
8 => "VIII",
9 => "IX",
10 => "X",
11 => "XI",
50 => "L",
51 => "LI",
100 => "C",
101 => "CI",
500 => "D",
501 => "DI",
1000 => "M",
1001 => "MI"
);
for (sort { $a <=> $b } keys %map) {
my $num = Roman::Numeral->new($_);
is($num->{value}, $_, "value $_");
}
done_testing();
With the failing test in place, we need to implement a reusable 'new' method to instantiate our object this should accept a string that contains a roman numeral or a modern day integer and return a class, we are thinking ahead and making it reusable as 'new' is called multiple times from within the perl example. To do this we will create two functions in the XS file, one a XSUB 'new' which will be accessible from Perl, and the other 'new_numeral' a reusable C function that will be used internally to create the object. Update your 'Roman/Numeral.xs' file with this code:
SV * new_numeral(SV *pkg, int num) {
dTHX;
if (SvTYPE(pkg) != SVt_PV) {
char * name = HvNAME(SvSTASH(SvRV(pkg)));
pkg = newSVpv(name, strlen(name));
}
HV *hash = newHV();
hv_store(hash, "value", 5, newSViv(num), 0);
return sv_bless(newRV_noinc((SV*)hash), gv_stashsv(pkg, 0));
}
MODULE = Roman::Numeral PACKAGE = Roman::Numeral
PROTOTYPES: DISABLE
FALLBACK: TRUE
SV *
new(pkg, numeral)
SV *pkg
SV *numeral
CODE:
RETVAL = new_numeral(pkg, SvIV(numeral));
OUTPUT:
RETVAL
This code defines a new function 'new' that takes a package name and a numeral, it simply calls our reusable new_numeral c function. In this function we check if the package is a valid string, if not we have a blessed object so we extract the name from the stash. We then create a new hash to hold our object data and store the value as an integer in the hash with the key "value". Finally, we bless the hash reference into the specified package and return it.
Now our simple test should pass lets extend to test the parsing of roman numerals when instantiating the object.
$num = Roman::Numeral->new($map{$_});
is($num->{value}, $_, "roman value $_");
To handle the parsing of Roman numerals, we will create two new c functions 'normalise' and 'from_roman' in the XS file. The 'normalise' function will check if the input is a Roman numeral or an integer, and convert it to an integer if necessary. The 'from_roman' function will convert a Roman numeral string to its integer value. Update your 'Roman/Numeral.xs' file with the following code above the 'new_numeral' function:
int from_roman (SV *numeral) {
dTHX;
STRLEN retlen;
char *roman = SvPV(numeral, retlen);
int num = 0;
int prev_value = 0;
for (STRLEN i = retlen; i-- > 0;) {
int value = 0;
switch (toupper(roman[i])) {
case 'I': value = 1; break;
case 'V': value = 5; break;
case 'X': value = 10; break;
case 'L': value = 50; break;
case 'C': value = 100; break;
case 'D': value = 500; break;
case 'M': value = 1000; break;
default: croak("Invalid Roman numeral character: %c", roman[i]);
}
if (value < prev_value) {
num -= value;
} else {
num += value;
}
prev_value = value;
}
return num;
}
int normalise (SV *numeral) {
dTHX;
if (!SvOK(numeral)) {
croak("Invalid parameter undef");
}
if (SvROK(numeral)) {
if (SvTYPE(SvRV(numeral)) != SVt_PVHV || !hv_exists((HV*)SvRV(numeral), "value", 5)) {
croak("Invalid parameter don't have a value");
}
return SvIV(*hv_fetch((HV*)SvRV(numeral), "value", 5, 0));
}
if (looks_like_number(numeral)) {
int num = SvIV(numeral);
if (num < 0) {
croak("Invalid parameter negative number");
}
return num;
}
return from_roman(numeral);
}
This code defines the 'from_roman' function, which converts a Roman numeral string to its integer value by iterating through each character in the string, checking its value and performing the relevant calculation. The 'normalise' function checks if the input is a reference to a hash, a number, or a Roman numeral string, and returns the corresponding integer value.
Now we can update our 'new' function to use the 'normalise' function to convert the input numeral to an integer, we just need to replace one line:
RETVAL = new_numeral(pkg, normalise(numeral));
With that in place, our 'new' function can now handle both Roman numerals and integers. make test should now pass, next lets write a quick accessor to access the value of the numeral as an integer. Add this test in the for loop:
is($num->as_number, $_, "as_number $_");
And then implement the 'as_number' method in the XS file:
SV *
as_number(self)
SV *self
CODE:
if (!SvROK(self) || SvTYPE(SvRV(self)) != SVt_PVHV) {
croak("self must be a Roman::Numeral object");
}
SV *value_sv = *hv_fetch((HV*)SvRV(self), "value", 5, 0);
SvREFCNT_inc(value_sv);
RETVAL = value_sv;
OUTPUT:
RETVAL
Now we have a method 'as_number', this method checks if the object is a valid Roman::Numeral object and then fetches the 'value' key from the internal hash, returning it as a scalar reference. We are ready to start overloading, lets start with stringification, we will implement the 'as_string' method which will convert the internal integer value to a Roman numeral string. We will start by extending our basic test file.
is($num->as_string, $map{$_}, "as_string $map{$_}");
is("$num", $map{$_}, "stringification $map{$_}");
Then update your 'Roman/Numeral.xs' file with the following 'roman_map' array/struct to store the mapping from number to numeral and a 'to_roman' c function. Insert this code above the 'from_roman' defintion:
const struct {
int value;
const char *symbol;
} roman_map[] = {
{1000, "M"}, {900, "CM"}, {500, "D"}, {400, "CD"},
{100, "C"}, {90, "XC"}, {50, "L"}, {40, "XL"},
{10, "X"}, {9, "IX"}, {5, "V"}, {4, "IV"}, {1, "I"}
};
char *to_roman(int num) {
dTHX;
int maxlen = num * 4 + 1;
char *roman = (char *)malloc(maxlen);
char *ptr = roman;
int i = 0;
while (num > 0) {
while (num >= roman_map[i].value) {
ptr += sprintf(ptr, "%s", roman_map[i].symbol);
num -= roman_map[i].value;
}
i++;
}
*ptr = '\0';
return roman;
}
And this "as_string" XSUB below the 'new' XSUB definition:
SV *
as_string(self, ...)
SV *self
OVERLOAD: \"\" #"
CODE:
if (!SvROK(self) || SvTYPE(SvRV(self)) != SVt_PVHV) {
croak("self must be a Roman::Numeral object");
}
SV *value_sv = *hv_fetch((HV*)SvRV(self), "value", 5, 0);
char *roman = to_roman(SvIV(value_sv));
SV *roman_sv = newSVpv(roman, strlen(roman));
free(roman);
RETVAL = roman_sv;
OUTPUT:
RETVAL
Your make test should pass again, to explain the code, firstly we define a 'roman_map' array that contains pairs of a integer value and its corresponding Roman numeral symbol. The 'to_roman' function dynamically allocates memory based upon the integer that is passed in we use this algorithm because the max size a Roman numeral string can be is four times the integer value plus one for the null terminator. The function then iterates through the 'roman_map', appending the appropriate Roman numeral symbols to a dynamically allocated string until the entire integer has been converted. The 'sprintf' function is used to write the Roman numerals, it returns the number of characters written. By adding this to ptr, the pointer now points just after the newly written string. This is useful when building up a string in a buffer, so the next write does not overwrite the previous one.
For the 'as_string' XSUB you will see a new 'OVERLOAD' directive which tells XS that this function is an overload for the stringification operator '""' (this #" is just so your editor doesn't get confused with the \"\" syntax). The function checks if the self is a valid hash reference, fetches the 'value' key and converts it to a Roman numeral string using the 'to_roman' function, and returns it as a new scalar value. We remember to free the dynamically allocated memory for the Roman numeral string after creating the scalar reference to avoid memory leaks.
With that working, lets start on the math operators, we will start with addition, so lets add a new test file 't/02-math.t' with the following code:
use Test::More;
use Roman::Numeral;
my $a = Roman::Numeral->new(10);
my $b = Roman::Numeral->new(2);
is($a->add($b), 'XII', '10 + 2 = 12 = X11');
is($a + $b, 'XII', '10 + 2 = 12 = X11');
done_testing();
To implement the addition operator, we will create an 'add' method in the XS file. This method will take another Roman numeral object or a number, normalise it, and return a new Roman numeral object with the sum. Update your 'Roman/Numeral.xs' file with the following code:
SV *
add(self, numeral, ...)
SV *self
SV *numeral
OVERLOAD: +
CODE:
if (!SvROK(self) || SvTYPE(SvRV(self)) != SVt_PVHV) {
croak("self must be a Roman::Numeral object");
}
RETVAL = new_numeral(self, normalise(self) + normalise(numeral));
OUTPUT:
RETVAL
This code defines the 'add' method, which checks if the self object is a valid reference, normalises both the self and the numeral parameters to their integer values, adds them together, and returns a new Roman::Numeral object with the result. The OVERLOAD is set to '+' to indicate that this method is an overload for the addition operator.
Then we run make test, and it should pass. Now we can implement the other math operators in a similar way. Here is some tests for the subtraction, multiplication, and division operators:
is($a->subtract($b), 'VIII', '10 - 2 = 8 = VIII');
is($a - $b, 'VIII', '10 - 2 = 8 = VIII');
is($a->subtract($b)->as_number, 8, '10 - 2 = 8 = VIII');
is($a->multiply($b)->as_number, 20);
is($a->multiply($b), 'XX', '10 * 2 = 20 = XX');
is($a * $b, 'XX', '10 * 2 = 20 = XX');
is($a->divide($b), 'V', '10 / 2 = 5 = V');
is($a / $b, 'V', '10 / 2 = 5 = V');
and the corresponding XS code for these operators:
SV *
subtract(self, numeral, ...)
SV *self
SV *numeral
OVERLOAD: -
CODE:
if (!SvROK(self) || SvTYPE(SvRV(self)) != SVt_PVHV) {
croak("self must be a Roman::Numeral object");
}
int self_value = normalise(self);
int numeral_value = normalise(numeral);
if (self_value < numeral_value) {
croak("Cannot subtract larger numeral from smaller one");
}
RETVAL = new_numeral(self, self_value - numeral_value);
OUTPUT:
RETVAL
SV *
multiply(self, numeral, ...)
SV *self
SV *numeral
OVERLOAD: *
CODE:
if (!SvROK(self) || SvTYPE(SvRV(self)) != SVt_PVHV) {
croak("self must be a Roman::Numeral object");
}
RETVAL = new_numeral(self, normalise(self) * normalise(numeral));
OUTPUT:
RETVAL
SV *
divide(self, numeral, ...)
SV *self
SV *numeral
OVERLOAD: /
CODE:
if (!SvROK(self) || SvTYPE(SvRV(self)) != SVt_PVHV) {
croak("self must be a Roman::Numeral object");
}
int self_value = normalise(self);
int numeral_value = normalise(numeral);
if (numeral_value <= 0) {
croak("Cannot divide by zero");
}
RETVAL = new_numeral(self, self_value / numeral_value);
OUTPUT:
RETVAL
These methods follow the same pattern as the 'add' method, checking if the self object is valid, normalising the values, performing the operation, and returning a new Roman::Numeral object with the result. The subtraction and divide also have additional validation to prevent negative values or invalid operations.
The final part left to implement is the numeric comparison operators. We will implement the 'num_eq', 'num_ne', 'num_lt', 'num_gt', 'num_le', and 'num_ge' methods and overloads all in one go. Here are the tests for these operators:
my $c = Roman::Numeral->new(10);
is($a == $c, 1);
is($a == $b, '');
is($a != $b, 1);
is($a != $c, '');
is($a > $b, 1);
is($b > $a, '');
is($b < $a, 1);
is($a < $b, '');
is($a >= $c, 1);
is($b >= $c, '');
is($a <= $c, 1);
is($a <= $b, '');
The code will follow a similar pattern to the previous methods, but this time doing comparisons. Update your 'Roman/Numeral.xs' file with the following code:
SV *
num_eq(self, numeral, ...)
SV *self
SV *numeral
OVERLOAD: ==
CODE:
if (!SvROK(self) || SvTYPE(SvRV(self)) != SVt_PVHV) {
croak("self must be a Roman::Numeral object");
}
RETVAL = (normalise(self) == normalise(numeral)) ? &PL_sv_yes : &PL_sv_no;
OUTPUT:
RETVAL
SV *
num_ne(self, numeral, ...)
SV *self
SV *numeral
OVERLOAD: !=
CODE:
if (!SvROK(self) || SvTYPE(SvRV(self)) != SVt_PVHV) {
croak("self must be a Roman::Numeral object");
}
RETVAL = (normalise(self) != normalise(numeral)) ? &PL_sv_yes : &PL_sv_no;
OUTPUT:
RETVAL
SV *
num_lt(self, numeral, ...)
SV *self
SV *numeral
OVERLOAD: <
CODE:
if (!SvROK(self) || SvTYPE(SvRV(self)) != SVt_PVHV) {
croak("self must be a Roman::Numeral object");
}
RETVAL = (normalise(self) < normalise(numeral)) ? &PL_sv_yes : &PL_sv_no;
OUTPUT:
RETVAL
SV *
num_gt(self, numeral, ...)
SV *self
SV *numeral
OVERLOAD: >
CODE:
if (!SvROK(self) || SvTYPE(SvRV(self)) != SVt_PVHV) {
croak("self must be a Roman::Numeral object");
}
RETVAL = (normalise(self) > normalise(numeral)) ? &PL_sv_yes : &PL_sv_no;
OUTPUT:
RETVAL
SV *
num_le(self, numeral, ...)
SV *self
SV *numeral
OVERLOAD: <=
CODE:
if (!SvROK(self) || SvTYPE(SvRV(self)) != SVt_PVHV) {
croak("self must be a Roman::Numeral object");
}
RETVAL = (normalise(self) <= normalise(numeral)) ? &PL_sv_yes : &PL_sv_no;
OUTPUT:
RETVAL
SV *
num_ge(self, numeral, ...)
SV *self
SV *numeral
OVERLOAD: >=
CODE:
if (!SvROK(self) || SvTYPE(SvRV(self)) != SVt_PVHV) {
croak("self must be a Roman::Numeral object");
}
RETVAL = (normalise(self) >= normalise(numeral)) ? &PL_sv_yes : &PL_sv_no;
OUTPUT:
RETVAL
For each XSUB we are first checking if the self object is a valid hash reference, then we normalise both the self and numeral parameters to their integer values. We then perform the comparison and return a reference to either PL_sv_yes or PL_sv_no, which are predefined scalar variables in Perl representing true and false.
With all the operators implemented, we can now run our tests and see if everything works as expected.
That concludes todays journey into overloading operators in XS. We have created a Roman numeral class that can handle addition, subtraction, multiplication, division, and numeric comparisons using Perl's operator overloading feature. This allows us to work with Roman numerals in a way that feels natural and intuitive, just like the olden days.
If you have any questions or suggestions, feel free to leave a comment below. I hope you found this post helpful and informative. Happy coding!
Leave a comment