Virtual Spring Cleaning (part 11 of XX) - in which I determine the type of yak wool I need
In what is a somewhat mild case of yak-shaving, I need a portable
way of identifying file types in several situations. For example, when
indexing files for Dancer::SearchApp or when handling user uploads, I want to somewhat reliably identify various files.
I want to determine their type without trusting what the user tells me they are, and I also
want to send the appropriate Content-Type
header when serving files again.
There are various incarnations of the magic
file identification, but they
either rely on a few hardcoded magic numbers or rely on a
directory tree corresponding to the Freedesktop.org base directory
specification, neither of which are desireable for me when I want to deploy
a self-contained application onto a server.
For some desktop-Linux distributions, it might be desireable to fall
back to the shared-mime-info
package and File::MimeInfo, but as I
develop mostly on Windows
and the last release of the shared-mime-info
package there is the
version 0.80 (while current is 1.70), I don't want to go there either. Also, I don't want to share the setup for one application with the whole machine it runs on.
Building the C software to convert the XML file of the database into a decoder
is based on autotools and autotools on Windows only works if you buy into the
whole Cygwin idea. I prefer MinGW, ideally without all the shell programs that
autotools presume every system has.
The sliver of light in this whole situation is that the good folks at Freedesktop.org distribute their database as a single XML file and Perl is quite good at reading XML and parsing the commands therein. So I wrote a simple MIME handler that uses the database in the XML file from Freedesktop.org to identify file types without unpacking the XML into more files. As maybe you want to add your own file types without uploading their identifications to the global database, the module supports multiple databases and will return either the first match or all matches according to their priority.
So I've written and released MIME::Detect. The usage is pretty simple, quoting the SYNOPSIS section:
my $mime = MIME::Detect->new();
for my $file (@ARGV) {
print sprintf "%s: %s\n", $file, $_->mime_type
for $mime->mime_types($file);
};
$file
can be a filename or a file handle. In-memory data needs to be passed in
as an in-memory filehandle.
The module also provides easy support for adding your own custom rules, which I consider important for tighter detection of file types:
my $mime = MIME::Detect->new(
files => ['myapp/mime/custom.xml'],
);
The latest freedesktop.org release is distributed as XML file with the module unmodified as the GPL permits. The Perl source itself is not based on any of the C programs and thus licensed under the Artistic License.
This removes one of the blockers preventing me from releasing my user upload handler.
Leave a comment