Segfault Fixing for Dummies
Today i tried to rebase my dev branch of OpenGL.pm on the latest changes and found that a segfault in OpenGL::Array was assumed to be fixed, despite still being alive and well. With nothing (haha) better to do i decided to poke at it and see if i could fix it. I foregrounded #xs on irc.perl.org in my IRC client and set to work. My first stumbling block came surprisingly soon:
How to print debug messages in XS?
Since i can't just step through XS in the perl debugger, i had to debug via printing to the command line. However the XS in OpenGL.pm seems to swallow all output to STDOUT. Leont suggested it might be the test harness, but at that point i wasn't even running in a test harness. Even with bulk88 weighing in nobody actually knows why it does that, however the solution was fairly easy:
In XS croak() and warn() are provided and even work akin to printf.
Adding a whole lot of debug statements like that, i narrowed it down to an offending free() causing the crash. However it wasn't a double free and the address being freed was still the same as given by the original malloc. This means that something else was writing outside its allocated memory. So, the next problem:
How to validate memory accesses of a C program on Windows?
Did i mention yet that i'm on Windows? For those of you who don't already know that, now you know. :)
Now, manually figuring out which parts of a C program are misbehaving in their memory accesses is very tedious. Luckily there are automated tools, like valgrind. However valgrind has no windows port. Some googling however did turn up a number of windows alternatives, the first of which turned out to be instrumental in solving this issue:
Before i could get going with it though i had to build the XS code with debug symbols embedded. Leont was kind enough to point me towards various possibilities, with the following being the correct addition to Makefile.PL for gcc on windows:
OPTIMIZE => '-ggdb3'
With that done, DrMemory provided me with extremely useful output:
Error #7: UNADDRESSABLE ACCESS: writing 0x032e9780-0x032e9784 4 byte(s)
# 0 dll.exp.dll!rpn_push [d:\cpan\OpenGL/pogl_rpn.xs:627]
# 1 dll.exp.dll!rpn_exec [d:\cpan\OpenGL/pogl_rpn.xs:694]
# 2 dll.exp.dll!XS_OpenGL__Array_calc [d:\cpan\OpenGL/pogl_rpn.xs:1023]
On a hunch i guessed that this write to memory that had not been allocated was screwing things up for the free() calls occuring later on. There was a lot of misguided bumbling and time-wasting attempts to understand why some parts of the code did what they did after that, but in the end a relatively simple fix was to add a check to the push, causing it to croak whenever a write outside of the allocated memory was attempted, which pinpointed the places where the actual allocation needed to be fixed.