Virtual Spring Cleaning (part 11 of XX) - in which I determine the type of yak wool I need

In what is a somewhat mild case of yak-shaving, I need a portable way of identifying file types in several situations. For example, when indexing files for Dancer::SearchApp or when handling user uploads, I want to somewhat reliably identify various files. I want to determine their type without trusting what the user tells me they are, and I also want to send the appropriate Content-Type header when serving files again.

There are various incarnations of the magic file identification, but they either rely on a few hardcoded magic numbers or rely on a directory tree corresponding to the Freedesktop.org base directory specification, neither of which are desireable for me when I want to deploy a self-contained application onto a server.

For some desktop-Linux distributions, it might be desireable to fall back to the shared-mime-info package and File::MimeInfo, but as I develop mostly on Windows and the last release of the shared-mime-info package there is the version 0.80 (while current is 1.70), I don't want to go there either. Also, I don't want to share the setup for one application with the whole machine it runs on. Building the C software to convert the XML file of the database into a decoder is based on autotools and autotools on Windows only works if you buy into the whole Cygwin idea. I prefer MinGW, ideally without all the shell programs that autotools presume every system has.

The sliver of light in this whole situation is that the good folks at Freedesktop.org distribute their database as a single XML file and Perl is quite good at reading XML and parsing the commands therein. So I wrote a simple MIME handler that uses the database in the XML file from Freedesktop.org to identify file types without unpacking the XML into more files. As maybe you want to add your own file types without uploading their identifications to the global database, the module supports multiple databases and will return either the first match or all matches according to their priority.

So I've written and released MIME::Detect. The usage is pretty simple, quoting the SYNOPSIS section:

  my $mime = MIME::Detect->new();

  for my $file (@ARGV) {
    print sprintf "%s: %s\n", $file, $_->mime_type
        for $mime->mime_types($file);
  };

$file can be a filename or a file handle. In-memory data needs to be passed in as an in-memory filehandle.

The module also provides easy support for adding your own custom rules, which I consider important for tighter detection of file types:

my $mime = MIME::Detect->new(
    files => ['myapp/mime/custom.xml'],
);

The latest freedesktop.org release is distributed as XML file with the module unmodified as the GPL permits. The Perl source itself is not based on any of the C programs and thus licensed under the Artistic License.

This removes one of the blockers preventing me from releasing my user upload handler.

Leave a comment

About Max Maischein

user-pic I'm the Treasurer for the Frankfurt Perlmongers e.V. . I have organized Perl events including 9 German Perl Workshops and one YAPC::Europe.