Perl Weekly Challenge 032: Frequency Table & ASCII Bar Chart

Frequency Table

Create a script that either reads standard input or one or more files specified on the command-line. Count the number of times and then print a summary, sorted by the count of each entry.

The original title of the task was “Count instances”, but I’ve always known the output as the “frequency table”. For years, I’ve used the following bash script to produce it:

#! /bin/bash
cat "$@" | sort | uniq -c | sort -n

The first element in the pipeline is cat. It outputs all the files given to it as arguments, or the standard input if there are no arguments. sort sorts the output so duplicate lines are adjacent, which is needed for uniq. Its -c argument means “count”: it prepends the number of occurrences to each line. The final sort is invoked with -n for “numerical”, i.e. it sorts the output by the number of occurrences.

Creating a similar table in Perl is a FAQ. We store each line in a hash, incrementing the corresponding value while reading the input line by line.

If we look carefully at the assignment of the task, though, we can notice that the output should be formatted differently: the numbers should go last and the columns should be aligned. Also, there’s the extra credit which we definitely want.

We’ll use printf to format the columns. In order to align them, we need to find the longest string in the first column. max from List::Util can find that if applied to the strings mapped to their lengths.

For the extra credit, we just need to add a command line option. This is trivial with Getopt::Long; and to create CSV output, we’ll use Text::CSV_XS.

Note that the script works correctly even for UTF-8 encoded input thanks to open.

#! /usr/bin/perl
use warnings;
use strict;

use Getopt::Long;
use Text::CSV_XS qw{ csv };
use List::Util qw{ max };

use open IO => ':encoding(UTF-8)', ':std';

GetOptions(csv => \ my $csv_output);

my %count;
chomp, ++$count{$_} while <>;

if ($csv_output) {
    csv(in => [
        map [$_, $count{$_} ],
        sort { $count{$b} <=> $count{$a} }
        keys %count]);

} else {
    my $max_length = max(map length, keys %count);
    printf "%${max_length}s %d\n", $_, $count{$_}
        for sort { $count{$a} <=> $count{$b} } keys %count;
}

ASCII Bar Chart

Write a function that takes a hashref where the keys are labels and the values are integer or floating point values. Generate a bar graph of the data and display it to stdout.

I went the OO way for this one. We need to find the longest title (similarly to task 1) to align the graph, it should be possible to specify the width of the graph (and if it wasn’t specified, it should be possible to retrieve the terminal width from the system). I also added the option to specify the separator between the legend and the graph.

I used Moo as the OO framework. You can see that there is only one required constructor argument data, we can also specify the separator which otherwise defaults to the vertical bar; and if we don’t specify the width, it’s taken from the environment variable $COLUMNS (*nix specific and unfortunately not exported by default, so if you want to display a terminal-wide graph, you have to invoke it as COLUMNS=$COLUMNS pwc032-2.pl). The width is a lazy attribute which means if it needs to be computed, the computation doesn’t happen until we ask for its value.

The object keeps two internal values: the bar width and maximal title length. Both of them are lazy; the maximal length is computed the same way as in task 1, while the bar width is simply the width of the graph minus the longest title’s length minus the length of the separator.

The object has only one public method generate which prints the graph. You can specify how to sort the lines as the argument to the method, it sorts by keys (i.e. titles) by default. The particular sorting methods are private to the class and the right one is selected from a dispatch table.

I used Function::Parameters to be able to specify the types of the object attributes and to save some typing (no my $self anywhere!).

The BUILD method is used to check whether the graph fits into the given width.

You can also see namespace::clean used, it prevents SortBy, max, HashRef, etc. from leaking into the namespace of the class (without it, you can call $chart->SortBy, even if it makes no sense).

The formatting of the graph is performed by printf similarly to task 1. Note the use of the repetition operator x.

#! /usr/bin/perl
{   package My::BarChart;
    use Moo;

    use Function::Parameters;
    use Types::Standard qw{ HashRef Num Str Enum };
    use List::Util qw{ max };

    sub SortBy { Enum[qw[ labels values labels_desc values_desc ]] }

    use namespace::clean;

    has width     => (is => 'lazy', isa => Num);
    has data      => (is => 'ro',   isa => HashRef[Num], required => 1);
    has separator => (is => 'ro',   isa => Str, default => ' | ');

    has _bar_width  => (is => 'lazy', isa => Num);
    has _max_length => (is => 'lazy', isa => Num);

    method generate (SortBy $sort_by = 'keys') {
        my $data = $self->data;
        my $max = max(values %$data);

        my $sort = {labels      => \&_by_key,
                    values      => \&_by_value,
                    labels_desc => \&_by_key_desc,
                    values_desc => \&_by_value_desc}->{$sort_by};

        for my $key (sort { $self->$sort } keys %$data) {
            printf '%' . $self->_max_length . "s%s%s\n",
                $key,
                $self->separator,
                '#' x ($self->_bar_width / $max * $data->{$key});
        }
    }

    method BUILD ($) {
        die "Chart is too wide.\n" if $self->_bar_width <= 0;
    }

    method _build_width () { $ENV{COLUMNS} || qx{tput cols} || 80 }

    method _build__max_length () { max(map length, keys %{ $self->data }) }

    method _build__bar_width () {
        $self->width - $self->_max_length - length $self->separator
    }

    method _by_key        () { $a cmp $b }
    method _by_key_desc   () { $b cmp $a }
    method _by_value      () { $self->data->{$a} <=> $self->data->{$b} }
    method _by_value_desc () { $self->data->{$b} <=> $self->data->{$a} }
}

use warnings;
use strict;
use feature qw{ say };

my $data = { apple => 3, cherry => 2, banana => .5 };
my $chart = 'My::BarChart'->new(data => $data);

say $chart->generate('labels');
say $chart->generate('values');
say $chart->generate('values_desc');

Leave a comment

About E. Choroba

user-pic I blog about Perl.