Rewriting Gumbo Binding - A GPTrixie 'demo'
originally wanted to make a small history about GPTrixie, but it will probably be boring and you can probably look at the commit history to have an idea of how it evolved. Instead, we will see how to rewrite my Gumbo binding using GPTrixie.
What is Gumbo?
Gumbo is a standalone C99 library that parse HTML5. It's heavily tested and it's project endorsed by google. Gumbo on github
What is GPTrixie?
GPTrixie is a tool that extracts definitions from a C header to transform them into their perl6 NativeCall counterpart. This definition is partially false since it actually extract the C definition from a XML file produced by GCCXML. C parsing is something a compiler like clang or GCC are more likely to do a better job than me with my poor compiler knowledge. Anyways you can find it at GPTrixie on github or just install it with panda install App::GPTrixie
Be careful with GCCXML, a project named CAST is supposed to replace it, it's based on clang/llvm but sadly it does not support modern C (C99 C11 for example). Some distribution install cast in place of gccxml (debian for example, but you get gccxml as a gccxml.real binary). You can set a GPT_GCCXML environement variable to point to the right gccxml
GPTrixie stands for The Great and Powerful Trixie. She is a fictional character from the show My Little Pony, She is a stand-up magician. If you don't like the name, I invite you to find me a name that is less boring than NativeCall Generator :)
What we will rewrite and the goal of this blog?
Obviously writing a tool that also write the logic behind the library is not realistically possible. The goal of this blog is to rewrite the Gumbo::Binding component of my Gumbo binding module. It holds the definition of Gumbo functions and structures.
Anyone that write a binding for a C library know how tedious it can be, and it's easy to overlook a field in a structure or writing the wrong type (You can think of all the pre x-mas binding with lot of int instead of int32, leading to weird bug on x86 32bits). So my manually written Binding.pm6 took a lot of trials and errors, and it's not complete.
Originally I write GPTrixie to be a tool that you run over a header and copy paste the output in your file and change what does not fit you. (Because it can't know if a char * is a string or a Buffer or data for example). Now the goal is to have a file that hold some option and specification that will make GPTrixie generate a 'ready to use' Binding.pm6 file.
First steps
I recommend following some steps with gptrixie before writing the configuration file.
First run gptrixie on your header file without options it will give you something like:
root@testperl6:~/piko/cpp/gptrixie# perl6 -I lib bin/gptrixie /usr/local/include/gumbo.h Calling GCCXML : gccxml /usr/local/include/gumbo.h -fxml=plop.xml[] Parsing the XML file Doing magic Times -- gccxml: 0.7278153 sec; xml parsing: 3.4344711 sec; magic: 0.3243911 Number of things founds -Types: 93 -Structures: 10 -Unions: 1 -Enums: 6 -Functions: 12 -Variables: 4 -Files: 3
The last part is interesting. Let's me explain a bit. The number of types is the number of C type founds, the value is not really relevent because something like const char * in C generates 3 types (char is one, Pointer to char is another, const on char * is another). Structures/Unions/Enums/Functions/Variables is quite self explicit (Variables is exported extern variables).
The files number is important to consider. let use the --list-files options to have the list.
f0 : /usr/share/gccxml-0.9/GCC/4.9/gccxml_builtins.h - Functions(0), Enums(0), Structures(0) f1 : /usr/local/include/gumbo.h - Functions(12), Enums(6), Structures(10) f2 : /usr/lib/gcc/i586-linux-gnu/4.9/include/stddef.h - Functions(0), Enums(0), Structures(0)
f1, f0, f2 are the id used by gccxml and gptrixie. The first file is irrelevant. We can see that gumbo.h holds the interesting stuff. stddef.h is here because gumbo use the C99 bool type. In the case of Gumbo we are lucky because only one file hold what we need (gumbo.h). It's not necessary the case with all the lib. Here the output with mysql.h
f0 : /usr/share/gccxml-0.9/GCC/4.9/gccxml_builtins.h - Functions(0), Enums(0), Structures(0) f1 : /usr/include/mysql/mysql.h - Functions(104), Enums(5), Structures(16) f2 : /usr/include/i386-linux-gnu/bits/time.h - Functions(0), Enums(0), Structures(1) f3 : /usr/include/i386-linux-gnu/sys/types.h - Functions(0), Enums(0), Structures(0) f4 : /usr/include/i386-linux-gnu/bits/pthreadtypes.h - Functions(0), Enums(0), Structures(5) f5 : /usr/include/mysql/typelib.h - Functions(7), Enums(0), Structures(1) f6 : /usr/include/i386-linux-gnu/bits/byteswap.h - Functions(0), Enums(0), Structures(0) f7 : /usr/include/i386-linux-gnu/bits/sigset.h - Functions(0), Enums(0), Structures(1) f8 : /usr/include/i386-linux-gnu/sys/select.h - Functions(2), Enums(0), Structures(1) f9 : /usr/include/mysql/my_list.h - Functions(7), Enums(0), Structures(1) f10 : /usr/include/mysql/mysql_com.h - Functions(30), Enums(6), Structures(6) f11 : /usr/include/i386-linux-gnu/bits/types.h - Functions(0), Enums(0), Structures(1) f12 : /usr/include/time.h - Functions(0), Enums(0), Structures(1) f13 : /usr/include/mysql/my_alloc.h - Functions(0), Enums(0), Structures(2) f14 : /usr/include/i386-linux-gnu/bits/select2.h - Functions(0), Enums(0), Structures(0) f15 : /usr/include/mysql/mysql_time.h - Functions(0), Enums(1), Structures(1) f16 : /usr/include/i386-linux-gnu/sys/sysmacros.h - Functions(3), Enums(0), Structures(0) f17 : /usr/lib/gcc/i586-linux-gnu/4.9/include/stddef.h - Functions(0), Enums(0), Structures(0)
enum GumboAttributeNamespaceEnum is export ( GUMBO_ATTR_NAMESPACE_NONE => 0, GUMBO_ATTR_NAMESPACE_XLINK => 1, GUMBO_ATTR_NAMESPACE_XML => 2, GUMBO_ATTR_NAMESPACE_XMLNS => 3 );
class GumboText is repr('CStruct') is export { has Str $.text; # const char* text HAS GumboStringPiece $.original_text; # GumboStringPiece original_text HAS GumboSourcePosition $.start_pos; # GumboSourcePosition start_pos }
#-From /usr/local/include/gumbo.h:104 #/** # * Compares two GumboStringPieces, and returns true if they're equal or false # * otherwise. # */ #bool gumbo_string_equals( # const GumboStringPiece* str1, const GumboStringPiece* str2); sub gumbo_string_equals(Pointer[GumboStringPiece] $str1 # const GumboStringPiece* ,Pointer[GumboStringPiece] $str2 # const GumboStringPiece* ) is native(LIB) returns bool is export { * }
module-name => 'Gumbo::Binding', env-name => 'PERL6_GUMBOLIB', clib-name => 'gumbo', clib-abiversion => v1, merge-struct-typedef => True, files => ['gumbo.h'], exclude-enums => ['GumboTag'] );
Some options are quite self-explantory. merge-struct-typedef will make gptrixie replaces a structure name with an associated typedef (and remove the typedef from its know type), fixing the GumboInternalNode issue. excludes-enums allows excludingthe list of enums.
env-name, clib-name and clib-abiversion are used to define the LIB constant. env-name is used for an environment variable that allow for an user to manually specify the library file to use.
You can run gptrixie --gptfile gumbo.gpt /usr/local/include/gumbo.h and it will generate a Gumbo-Binding.pm6 file. At this point we should have an usable file.Fixing the Pointer[MyStruct] issue, or a glimpse of GPTrixie internalsIt's not an option, so we need to modify how gptrixie generate the perl6 string representing a type in the default (only) generator. The default generator is dumb (it's his name). But it look like an easy fix, just when we found a pointer to a structure, just generate normally like a structure and ignore the pointer.Let have a look at the existing code. (here) and how it change a Pointer to char to a Str.return 'Str' if ($t.ref-type ~~ FundamentalType and $t.ref-type.name eq 'char') ||($t.ref-type ~~ QualifiedType and $t.ref-type.ref-type ~~ FundamentalType and $t.ref-type.ref-type.name eq 'char');It look a bit lengly because it handle 2 cases, char *, const char *. GPTrixie keep types a bit like how they appear in gccxml, a complex type is generally a combinaison of type. FundamentalType are char, int, void..., QualifiedType are const.
The fix look quite easy. when we found a Pointer to structure, we just call the function itself on the structure type, but sadly it translate to something more complicated because of typedef or const
return resolve-type($t.ref-type, $cpt + 1) if ($t.ref-type ~~ StructType) || ($t.ref-type ~~ QualifiedType and $t.ref-type.ref-type ~~ StructType) || ($t.ref-type ~~ TypeDefType and $t.ref-type.ref-type ~~ StructType);That seems to be enought for this
Using the file in GumboIt's now time to use the generated file to replace my manually written binding file. I just copied the Gumbo-Binding.pm6 file in place of lib/Gumbo/Binding.pm6. First step is to change the name of the Gumbo type on the file that use the binding, since I used camel case when writing the binding. Second is to remove some nativecast because I declared some functions as returning Pointer instead of the proper structs.
Let's run a test. After fixing some other compilation errors it finally runs but it segfault. After some investigation it appear than exporting cglobal does not work. The output generated by gptrixie does not work and even after changing the affectation (=) to a binding (:=) it give me an (Any) value for the variable. If I look back at the old binding I already run into this issue as it commented, and I have to use cglobal directly in the code that use the binding.
It make me change the generated file by GPTrixie to be able to reuse the LIB variable that define how to find the Gumbo library.
our $GUMBO_LIB is export = LIB;You can notice that I used a $ sigil and not a constant, but currently rakudo misapply the is export trait if you do so and produce an error.
ConclusionGenerating the Binding/Raw file for the C library you want to use with GPTrixie works quite fine. I am surprised it worked with very few additionnal work on my side. We can argue that gumbo is a nice case with only one header file and not outside weird definition, but I think it's a good example and also a nice validation for me to see that GPTrixie work how I imaged it.
Does it end here? Well for Gumbo binding probably. But there is still lot of work to do in GPTrixie, the part that generate file is a bad copy paste of the main code that use the DumbGenerator to produce an output and I probably want to write a generator that allow for a finner control of the perl6 generated for function/structure. Also I probably want ways for the generator to do extra work like me adding the $GUMBO_LIB and really to never have to touch the generated file.
(I am sorry for the font changing mid-way, the code snippet mess up with it and I can't figure how to restore the font)
Leave a comment