Switch lots of things on at once

Many people already have codes like

use strict;
use warnings;

and so on at the top of each script they write. For scripts which I don't intend to publish anywhere, I have a module (which I accidentally called Z not knowing there was already a module of the same name on CPAN), which switches on lots of things at once just by saying

use Z;

The top bit goes like this:

package Z;
use warnings;
use strict;
use utf8;

use Carp;
use Deploy 'do_system';
use File::Slurper qw!read_text write_text read_lines!;
use FindBin '$Bin';
use Getopt::Long;
use Table::Readable ':all';
use v5.32;
no warnings qw(experimental::signatures);
use feature qw(signatures);

So far that is all fairly standard stuff, but what Z does is also to import all of the above things into my script using the EXPORT variables from the above modules:

our $VERSION = '0.01';

require Exporter;
our @ISA = qw(Exporter);

our @EXPORT = (
    @Carp::EXPORT,
    @Deploy::EXPORT_OK,
    @File::Slurper::EXPORT_OK,
    @FindBin::EXPORT_OK,
    @Getopt::Long::EXPORT,
    @Table::Readable::EXPORT_OK,
);

This requires a special import method:

sub import
{
    my ($class) = @_;

    strict->import ();
    utf8->import ();
    warnings->import ();
    warnings->unimport (qw(experimental::signatures));
    feature->import ('signatures');

    Carp->import ();
    File::Slurper->import (qw!read_text write_text!);
    FindBin->import ('$Bin');
    Getopt::Long->import ();
    Deploy->import ('do_system');
    Table::Readable->import (':all');

    Z->export_to_level (1);
}

To save another bit of boilerplate I also have

binmode STDOUT, ":encoding(utf8)";

at the end of the module.

This is for personal convenience so it's not something I would use publicly, but perhaps people who want to save themselves a bit of boilerplate might find this useful for making their own "personal module".

Relatively easy ways to catch memory errors

If you're using XS and C in your Perl module, you might have to worry about memory errors. There are tools like valgrind or memory sanitizer which you can use of course:

valgrind perl -I blib/lib -I blib/arch ./mytestscript

or if your problems are even more serious ones like segmentation fault errors you can run Perl under a debugger like gdb:

gdb perl
% run -I blib/lib -I blib/arch ./mytestscript

However, it might be worth noting some "easy" ways to catch memory errors which actually catch a lot of things.

The first thing is setting the variable to 0 after freeing:

 free (result);
 result = 0;

This prevents you from using the variable again accidentally after freeing, although free (0) is actually not an error, so it doesn't prevent you freeing it twice.

Another way which catches quite a lot of careless mistakes is counting mallocs and frees:

 x = malloc (100);
 n_mallocs++;
 y = malloc (300);
 n_mallocs++;

 free (x);
 n_mallocs--;

then at the end of the program:

 if (n_mallocs != 0) {
      fprintf (stderr, "n_mallocs = %d\n", n_mallocs);
 }

I was talking to a friend who works as a surgical assistant about something I saw on "Tomorrow's World" where they had a very complicated machine for ensuring that surgical equipment doesn't get left in people's bodies. It used all kinds of pattern recognition to identify instruments as they were checked out of the sterilizer and then returned. I was quite surprised when she told me that the way that they do that is to always count how many instruments they have issued and then count how many they have returned to the sterilizer. Similar to just counting the mallocs, it's a much simpler but equally effective method for detecting unfreed memory.

For example, in this module I made the n_mallocs counter part of the structure which contains the JSON information. In this module I put the counter and the memory assignment inside a macro, which is probably a good way to make sure not to forget to increment. But if you do forget to increment or decrement, the error message will trip during testing, and you'll quickly find out.

There are lots of other things you can do, such as using memory sanitizer etc., but these easy methods actually get a lot of the problems.

I failed to pause before blogging

I got this email from PAUSE just now:

Failed: PAUSE indexer report BKB/Go-Tokenize-0.01.tar.gz

     module : switch

It looks like it doesn't like this line of code containing Go keywords:

chan         else         goto         package      switch

Gzip::Zopfli - another compression module

Following on from the Gzip::Libdeflate I mentioned before, I also made this: Gzip::Zopfli

It is based on the Zopfli gzip compression library from Google Research.

Both Zopfli and libdeflate seem to excel at compressing JSON files compared to the ordinary zlib compression, sometimes with an improvement of up to 30%. Zopfli usually wins by a small margin over libdeflate. Here are some results of this script on random JSON files:

6194 698 703 682
3809 1184 977 953
1446 494 486 485
1572152 824430 710102 707844
28061 5080 4554 4546
565082 125097 118708 118494
317448 164741 140087 139829
2069690 1062866 865985 866129
565082 125097 118708 118494

On the left is the uncompressed size, then gzip -9, libdeflate, and zopfli number of bytes.

Zopfli almost always wins by a small margin over libdeflate.

People who want to try comparing the three things might find this repository useful:

At the time of publishing this blog post, the scripts will require a little editing to work.

New compression module Gzip::Libdeflate

I've turned the libdeflate compression library into a CPAN module:

Gzip::Libdeflate

This is the gzip compression method, but updated.

It's supposed to be much faster and better than libz (the original gzip library).

Sometimes I am getting compression of as much as 30% better on some files.

So far I haven't tested whether or not it is faster.

See the module above for links to the original library and so on.

Since this is an early release, it's very likely indeed that bugs in the module are my fault rather than any problem with libdeflate, so please report them to me.