My Favorite Modules: PerlIO::via

OK, I confess: PerlIO::via is not a module that I use every day. It allows you, easily, and with minimal code, to modify an I/O stream before it gets to the reader of the stream. or after the writer has written it. All you do is write (say) My::Module conforming to the parts of the PerlIO::via interface you need, and provide it to the second argument of open() or binmode() as ':via(My::Module)'. How cool is that? And how cool is a language that lets you do that with a minimum of fuss, bother, and code?

I encountered this when trying to modify (OK, hack) the behavior of a large and complex hunk of Perl not under my control. Rummaging around in this turned up the fact that all file input went through a single module/object, which had an open() method. I realized if I could insert my own PerlIO layer into the input stream, I would have control over what the victim host code saw.

In the true spirit of the Conan the Barbarian school of programming ("Bash it until it submits!") I wrote a PerlIO::via module whose import() method monkey-patched the open() to insert my layer into the stack. All I had to do was launch the host code with -MMy::Module and the dirty deed was done.

If you read the PerlIO::via documentation you see a whole host of methods you can provide. All I wanted to do was modify the input stream, and that can be done by implementing just two or three:

You will have to provide PUSHED(), which is called when your layer is pushed onto the I/O stack. That is, when someone specifies it in the second argument of open() or binmode(). This is called as a static method, and given a fopen()-style mode string (i.e. 'r', 'w', or what have you) and the already-opened handle, which represents the layer below. This method needs to instantiate and return an object of the given class. Depending on your needs, this can be as simple as

sub PUSHED {
    my ( $class ) = @_;
    return bless {}, $class;
}

You have a couple options for how to get the input, but I opted for FILL(). This is called as a method, and passed a file handle which is open to the next layer down in the PerlIO stack. This would look something like:

sub FILL {
    my ( $self, $fh ) = @_;
    defined( my $data = <$fh> )
        or return;

    # Do your worst to the $data

    return $data;
}

A few paragraphs back I said "two or three" methods. For a while I was content with the above two. But then I realized that the caller was getting back bytes even if the file was opened with :encoding(...) specified in a lower layer, and the FILL() method preserved the character-nature of the data. Wrestling with this finally drove me back to the documentation, where I found the UTF8() method.

The UTF8() method is optional, and is called (if it exists) right after PUSHED(). It receives one argument, which is interpreted as a Boolean, and is true if the next-lower layer provides characters rather than bytes. The returned value tells PerlIO whether your layer provides characters (if true) or bytes (if false). A minimal-but-sufficient implementation is

sub UTF8 {
    my ( undef, $below_flag ) = @_;
    return $below_flag;
}

Caveat: If you apply the encoding and your layer in the same operation (e.g. binmode $fh, ':encoding(utf-8):via(My::Module)';, the UTF8() method will not see a true value of $below_flag. There are two ways of dealing with this:

  • Apply your PerlIO::via layer in a separate call to binmode(), or
  • Specify an explicit :utf8 after your layer (that is, binmode $fh, ':encoding(utf-8):via(My::Module):utf8';).

This is already a longer note than I like, but I have to say something about :utf8. The current documentation calls it a pseudo-layer. What it really is is a bit on the layer below, telling PerlIO that the layer it applies to provides characters rather than bytes on input, or accepts characters on output. Around Perl 5.8 or 5.10 there was a fair amount of misunderstanding about what :utf8 did, and there was actually core Perl documentation that said (or seemed to say) that you did UTF-8 I/O by specifying this layer. Most such instances of :utf8 in the core documentation have been replaced by :encoding(utf-8) but there may still be some :utf8 in outlying regions of the documentation.

By using :utf8 in the second example above, what I am telling Perl is that :via(My::Module) produces decoded output. It does, because the layer below it (:encode(utf-8)) does, and :via(My::Module) preserves this property. Without the :encode(utf-8) below it it would be an error to tell PerlIO that :via(My::Module) produced characters unless My::Module did the decoding itself.

If you want to see what layers are in effect on file handle $fh, you can call PerlIO::get_layers( $fh ). This returns a list, which will include :utf8 as a separate entry, maybe more than once if more than one layer has that bit set.

Previous entries in this series:

  1. if
  2. diagnostics
  3. Term::ReadLine::Perl
  4. re
  5. Devel::NYTProf
  6. Errno
  7. Time::Piece
  8. filetest
  9. File::stat

Leave a comment

About Tom Wyant

user-pic I blog about Perl.