Perl Weekly Challenge 32: Word Histogram and ASCII Bar Chart

These are some answers to the Week 32 of the Perl Weekly Challenge organized by Mohammad S. Anwar.

Spoiler Alert: This weekly challenge deadline is due in a few days (November 3, 2019). This blog post offers some solutions to this challenge, please don’t read on if you intend to complete the challenge on your own.

This week, both tasks were contributed by Neil Bowers.

Task # 1: Word Histogram

Create a script that either reads standard input or one or more files specified on the command-line. Count the number of times and then print a summary, sorted by the count of each entry.

So with the following input in file example.txt:

apple
banana
apple
cherry
cherry
apple

the script would display something like:

apple     3
cherry    2
banana    1

For extra credit, add a -csv option to your script, which would generate:

apple,3
cherry,2
banana,1

In other words, we’re supposed to make an histogram reporting the frequency of various words in the input.

Word Histogram in Perl 5

It is just a matter of collecting the frequency data in a %histo hash and then sort and printing the values.

#!/usr/bin/perl
use strict;
use warnings;

my %histo;
while (<>) {
    chomp;
    s/\s+$//;
    next if /^\s*$/; # skipping empty lines
    $histo{$_}++;
}
for my $key (sort { $histo{$b} <=> $histo{$a} } keys %histo) {
    printf "%-20s%d\n", $key, $histo{$key};
}

We can use the shell echo command to pipe some values into the script standard input:

$ echo 'apple
> banana
> apple
> cherry
> cherry
> apple' | perl word_histo.pl
apple               3
cherry              2
banana              1

CSV Extra Credit

For the extra credit, I’ll first create an actual word_test.txt test file:

$ cat word_test.txt
apple
banana
apple
cherry
cherry
apple

The new script will take a first parameter (0 or 1) to determine if the output should be fixed columns as before or CSV. The other parameter(s) will be one or several word files. And, for such a simple CSV, I’ll just do it manually and simply use two different formatting strings for printf. I know some people are adamant on using a CSV module, not me (well, at least, not for such a simple case): no one needs a 40-ton truck to deliver a book shipped by Amazon somewhere in a suburb or in the countryside.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

my %histo;
my $csv_flag = shift;

while (<>) {
    chomp;
    s/\s+$//;
    next if /^\s*$/; # skipping empty lines
    $histo{$_}++;
}
my $format = $csv_flag ? "%s,%d\n" : "%-20s%d\n";

for my $key (sort { $histo{$b} <=> $histo{$a} } keys %histo) {
    printf $format, $key, $histo{$key};
}

Now, running the script with two file parameters (actually twice the name of the word_test.txt input file just created), once with the first parameter set to a false value (0) and once with it set to a true value (1) to get CSV output:

$ perl word_histo2.pl 0  word_test.txt word_test.txt
apple               6
cherry              4
banana              2

$ perl word_histo2.pl 1  word_test.txt word_test.txt
apple,6
cherry,4
banana,2

Word Histogram in Perl 6/Raku

In Perl 6/Raku, rather than using a hash, we can use a Bag to directly assign frequencies when reading the input files. Sorting the histogram can also be made simpler than in Perl 5: if the code block or subroutine passed to the sort built-in funtion accepts only one parameter, then this code block is not a comparison block or subroutine, but a code object implementing the transformation to be applied to the items to be sorted before using the default cmp comparison routine. Here, the {$histo{$_}} code block passed to the sortfunction, says that sort should apply the standard cmp comparison routine to the values associated with the keys that we want to sort. In addition, the P6/Raku cmp comparison routine and sort built-in function are clever enough to see that they are comparing integers and should therefore apply a numeric sort to the data.

use v6;

sub MAIN (*@files) {
    my $histo = @files.map({.IO.lines}).Bag;
    for reverse sort {$histo{$_}}, keys $histo -> $key {
        printf "%-20s%d\n", $key, $histo{$key};
    }
}

If you prefer another syntax, the histo assignment at the beginning could also be written as follows:

my $histo = (map {.IO.lines}, @files).Bag;

This program correctly takes into account the files passed as a parameter. For example, passing twice the same word_test.txt input file as before in P5:

$ perl6 word_histo.p6 word_test.txt word_test.txt
apple               6
cherry              4
banana              2

Note that there is (in Perl 6.d) a new IO.CatHandle class for seamlessly gathering input from multiple files, but this did not appear to be necessary here.

Extra Credit: CSV Option

For the extra credit, we could just try to add an optional parameter (with a default value), a Boolean flag to decide whether we want CSV output, and then change the printf formatting string according to its value:

sub MAIN (Int $csv = 0, *@files) { # ...

but that does not really work as expected when no CSV flag argument is passed to the program. It appears that an optional parameter cannot be put before a mandatory one. Changing the order of the arguments does not help, because an optional parameter cannot be put after variadic parameters.

We will therefore use two multi MAIN subroutines to handle cases where a Boolean flag is passed and those where there no such flag. These MAIN subroutines will populate a $*format dynamic scope variable and then call the same readfiles subroutine:

use v6;
subset file of Str where *.IO.f;

multi sub MAIN (Int $csv, *@files where all(@files) ~~ file) {
    my $*format = $csv ?? "%s,%d\n" !! "%-20s%d\n";
    readfiles @files;
}
multi sub MAIN (*@files where all(@files) ~~ file) {
    my $*format = "%-20s%d\n";
    readfiles @files;
}
sub readfiles (@files) {
    my $histo = @files.map({.IO.lines}).Bag;
    for reverse sort {$histo{$_}}, keys $histo -> $key {
        printf $*format, $key, $histo{$key};
    }
}

This can be tested with or without a CSV flag:

$ perl6 word_histo2.p6 word_test.txt word_test.txt
apple               6
cherry              4
banana              2

$ perl6 word_histo2.p6 1 word_test.txt word_test.txt
apple,6
cherry,4
banana,2

$ perl6 word_histo2.p6 0 word_test.txt word_test.txt
apple               6
cherry              4
banana              2

Using Named Parameters

So, now, this works properly, but we may still have a slight problem if the name of the file passed as an argument is a plain integer: this would become ambiguous, and the Moar virtual machine might not be able to tell an integer from a file. This can be solved with named arguments for the Boolean CSV flag, which will also make it possible to write shorter and simpler code:

use v6;
subset file of Str where *.IO.f;

sub MAIN (Bool :$csv, *@files where all(@files) ~~ file) {
    my $format = $csv ?? "%s,%d\n" !! "%-20s%d\n";
    my $histo = @files.map({.IO.lines}).Bag;
    for reverse sort {$histo{$_}}, keys $histo -> $key {
        printf $format, $key, $histo{$key};
    }
}

Now, we need to pass a --csv argument when calling the program at the CLI in order to activate the CSV format:

$ perl6 word_histo2.p6 word_test.txt word_test.txt
apple               6
cherry              4
banana              2

$ perl6 word_histo2.p6 --csv word_test.txt word_test.txt word_test.txt
apple,9
cherry,6
banana,3

Task 2: ASCII Bar Chart

Write a function that takes a hashref where the keys are labels and the values are integer or floating point values. Generate a bar graph of the data and display it to stdout.

The input could be something like:

$data = { apple => 3, cherry => 2, banana => 1 };
generate_bar_graph($data);

And would then generate something like this:

 apple | ############
cherry | ########
banana | ####

If you fancy then please try this as well: (a) the function could let you specify whether the chart should be ordered by (1) the labels, or (2) the values.

ASCII Bar Chart in Perl 5

There is really nothing complicated in generating the bars of the chart: we just need to use the x string repetition operator with the fruit values. However, I would like to standardize somehow the size of output, irrespective of the absolute values. For this, the program loops over the hash a first time to collect the minimum and maximum values, and computes a scaling factor as 10 / ($max - $min), and then uses that $scale_factor for standardizing the length of the bars, so that the bar graph has about the same size for values of 4, 6, and 9 as for values of 40, 60 and 90. The hard coded value of 10 arbitrarily chosen here simply means that the spread between the smallest and the largest value will be represented by 10 units (concretely, 10 # characters). I could have chosen another value, but I wanted the bar graphs to keep relatively small to make sure they remain correctly formatted within the limited page width of this blog post.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;

sub generate_chart {
    my %data = %{$_[0]};
    my ($max, $min);
    for my $key (keys %data) {
        ($max, $min) = ($data{$key}, $data{$key}) unless defined $max;
        $max = $data{$key} if $data{$key} > $max;
        $min = $data{$key} if $data{$key} < $min;
    }
    my $scale_factor = 10 / ($max - $min);
    for my $key (sort { $data{$b} <=> $data{$a} } keys %data) {
        printf "%15s | %s\n", $key, "#" x ($data{$key} * $scale_factor);
    }
}
my $data = { apple => 30, cherry => 60, banana => 10, pear => 40 };
generate_chart $data;

This generates the following output:

$ perl ascii_chart.pl
         cherry | ############
           pear | ########
          apple | ######
         banana | ##

With this scaling, if we now use the following hashref with all values ten times smaller:

my $data = { apple => 3, cherry => 6, banana => 1, pear => 4 };

the program generates more or less the same output (there may occasionally be some slight rounding differences):

$ perl ascii_chart.pl
         cherry | ############
           pear | ########
          apple | ######
         banana | ##

Ordering the Bar Chart in Accordance with Labels or Values

For ordering the bar chart in accordance with labels or values, we could simply have two for loops for printing out the bar chart and use one or the other loop depending on the ordering we selected. But that’s code repetition, and I don’t like to do that for numerous reasons, the main one being that, sometimes in the future, a person in charge of maintaining the project is likely to modify one of the loops and forget to modify the other accordingly. So let’s try to keep it DRY (“don’t repeat yourself”).

The solution suggested here uses a $sort_routine variable containing an anonymous subroutines as a first argument to the sortbuilt-in function. Depending on the $sort_type (label or value) passed to the program, the $sort_routine code reference will contain different code (value will be the default if no argument is passed):

    my $sort_routine = ($sort_type =~ /val/i) 
        ? sub { $data{$b} <=> $data{$a} } 
        : sub {$a cmp $b };

The full program now looks like this:

use strict;
use warnings;
use feature qw/say/;

sub generate_chart {
    my %data = %{$_[0]};
    my $sort_type = $_[1] // "val";
    my ($max, $min);
    for my $key (keys %data) {
        ($max, $min) = ($data{$key}, $data{$key}) unless defined $max;
        $max = $data{$key} if $data{$key} > $max;
        $min = $data{$key} if $data{$key} < $min;
    }
    my $scale_factor = 10 / ($max - $min);
    my $sort_routine = ($sort_type =~ /val/i) 
        ? sub { $data{$b} <=> $data{$a} } 
        : sub {$a cmp $b }; 
    for my $key (sort $sort_routine keys %data) {
        printf "%15s | %s\n", $key, "#" x ($data{$key} * $scale_factor);
    }
}
my $sort_type = shift;
my $data = { cherry => 20, apple => 30,  banana => 10, pear => 25};
generate_chart $data, $sort_type;

This is the output generated with various arguments passed to the program:

$ perl ascii_chart2.pl
          apple | ###############
           pear | ############
         cherry | ##########
         banana | #####

$ perl ascii_chart2.pl value
          apple | ###############
           pear | ############
         cherry | ##########
         banana | #####

$ perl ascii_chart2.pl label
          apple | ###############
         banana | #####
         cherry | ##########
           pear | ############

ASCII Bar Chart in Perl 6/Raku

For a start, we will use the same approach as in Perl 5 to scale the bar graph:

use v6;

sub generate_chart (%data) {
    my ($max, $min);
    for keys %data -> $key {
        ($max, $min) = (%data{$key}, %data{$key}) unless defined $max;
        $max = %data{$key} if %data{$key} > $max;
        $min = %data{$key} if %data{$key} < $min;
    }
    my $scale_factor = 10 / ($max - $min);
    for sort { %data{$^b} <=> %data{$^a} }, keys %data -> $key {
        printf "%15s | %s\n", $key, "#" x (%data{$key} * $scale_factor);
    }
}
my $data = { apple => 3, cherry => 6, banana => 1, pear => 4 };
generate_chart $data;

Note that Perl 6/Raku has no real difference between hashes and hash references in most cases.

This produces output similar to what we had in P5:

$ perl6 ascii_chart.p6
         cherry | ############
           pear | ########
          apple | ######
         banana | ##

Ordering the Bar Chart in Accordance with Labels or Values

As in Perl 5, we will use anonymous code references as the first argument to the sort built-in function.

use v6;

sub generate_chart (%data) {
    my ($max, $min);
    for keys %data -> $key {
        ($max, $min) = (%data{$key}, %data{$key}) unless defined $max;
        $max = %data{$key} if %data{$key} > $max;
        $min = %data{$key} if %data{$key} < $min;
    }
    my $scale_factor = 10 / ($max - $min);
    my &sort_routine = ($*sort-type ~~ m:i/val/) 
        ?? { %data{$^b} <=> %data{$^a} } 
        !! {$^a cmp $^b }; 
    for sort &sort_routine, keys %data -> $key {
        printf "%15s | %s\n", $key, "#" x (%data{$key} * $scale_factor);
    }
}
sub MAIN (Str $*sort-type) {
    my $data = { apple => 3, cherry => 6, banana => 1, pear => 4 };
    generate_chart $data;
}

This works as expected:

$ perl6 ascii_chart2.p6 val
         cherry | ############
           pear | ########
          apple | ######
         banana | ##

$ perl6 ascii_chart2.p6 lab
          apple | ######
         banana | ##
         cherry | ############
           pear | ########

Wrapping up

The next week Perl Weekly Challenge is due to start soon. If you want to participate in this challenge, please check https://perlweeklychallenge.org/ and make sure you answer the challenge before 23:59 BST (British summer time) on Sunday, November 10, 2019. And, please, also spread the word about the Perl Weekly Challenge if you can.

Leave a comment

About laurent_r

user-pic I am the author of the "Think Perl 6" book (O'Reilly, 2017) and I blog about the Perl 5 and Raku programming languages.