Introducing Unixish

Unixish is a simple framework for creating data transformation routines that can be applied to arrays or streams. The data transformation routines can also be accessed via command-line, much like your usual Unix utilities. In fact, the Data::Unixish distribution comes with several clones of Unix utilities like cat, shuf, head, tail, wc, sort, yes, rev.

Creating a data transformation routine, called a dux function (where dux is short for Data::Unixish) is easy enough. Let's say you're creating a function called revline to reverse the characters of each line of text. You write a function called revline and place it in Data::Unixish::revline package.

package Data::Unixish::revline;
sub revline {
    my %args = @_;
    my ($in, $out) = ($args{in}, $args{out});
    while (my ($index, $item) = each @$in) {
        $item = reverse($item) if defined($item);
        push @$out, $item;
    }
    [200, "OK"];
}
1;

Several things to note here. First, you accept arguments via a hash. The input and output are located in $args{in} and $args{out}, respectively. Command-line options will also be received as function arguments, hence the hash, since a rather complex utility can potentially accept many command-line options.

Second, you must always use each() to iterate over the array elements, instead of using for() or grep() or map(). Since via the magic of tie(), the array can actually be a stream (a long or infinite one, even), using the Perl 5's for/grep/map will slurp them all into memory. Using each() will nicely iterate each element without slurping.

Third, you must also add to the result using push() instead of assigning to them directly, e.g. $out = [1, 2, 3]. Since the output array might actually be a stream, push() is what you need to do.

Lastly, you return [200, "OK"] to signify success, like in HTTP.

After you write the function, you write a metadata for this function, to give summary and describe its arguments (if any). Metadata is written according to the Rinci specification.

use Data::Unixish::Util qw(%common_args);
our %SPEC;
$SPEC{revline} = {
    v => 1.1,
    summary => 'Reverse the characters of each line of text',
    args => {
        %common_args,
    },
};

As you can see, the metadata contains the program summary and lists the function arguments. The metadata can be used to construct command-line interface.

Here's another example from the head function which has an argument:

$SPEC{head} = {
    v => 1.1,
    summary => 'Output the first items of data',
    args => {
        %common_args,
        items => {
            summary => 'Number of items to output',
            schema=>['int*' => {default=>10}],
            tags => ['main'],
            cmdline_aliases => { n=>{} },
        },
    },
    tags => [qw/filtering/],
};

After all the above is done, you can start using your dux function. You can, of course, use the function directly like any other Perl function:


require Data::Unixish::revline;
my $input = ["one", "two", "three"];
my $output = [];
Data::Unixish::revline::revline(input=>$input, output=>$output);
# you get the output in $output: ["eno", "owt", "eerht"]

But for convenience, there are a set of XduxY routines provided by Data::Unixish to apply a dux function to some form of input and return some form of output. The X prefix can be one a/f/l which determines whether function accepts arrayref/file(handle)/list as input. The Y suffix can be one of a/f/l/c which determines whether function returns an array, a list, a filehandle, or calls a callback.

Examples:

use Data::Unixish qw(:all);
my @out = lduxl('revline', "one", "two", undef, 3); # => ("eno", "owt", undef, "3")
my $out = lduxa('sort', 7, 2, 4, 1); # => [1, 2, 4, 7]
my $out = aduxa('sort', [7, 2, 4, 1]); # => [1, 2, 4, 7]
my $res = fduxl('wc', "file.txt");    # => "12\n234\n2093" # like wc's output

If you choose filehandle output, a child process will be forked to run your dux function on demand as the filehandle is read from.


my $fh = fduxf([trunc => {width=>80, ansi=>1, mb=>1}], \*STDIN);
say while <$fh>;

You can also run your dux function via command line, using the dux program provided by the App::dux distribution.

% ls -l | dux wc -l
12

So in short, the advantage of using this framework is reusability: you only have to write one routine that can be applied to various forms of input and produces various types of output.

The current drawback is speed, it's not very fast due to all that tie() abstraction. If you want to process millions of items or more, you might want to write the routine with a more direct/low-level Perl. I first created the framework for Text::ANSITable to format table rows/columns, which typically won't number in the thousands/millions.

Leave a comment

About Steven Haryanto

user-pic A programmer (mostly Perl 5 nowadays). My CPAN ID: SHARYANTO. I'm sedusedan on perlmonks. My twitter is stevenharyanto (but I don't tweet much). Follow me on github: sharyanto.