Load a list of lines into an array (easily)

This blog post describes a common task my colleagues ask often about repeating a dynamic string in a defined token and adding some or, and, = in between, plus finishing smartly.

I like to use the Perl's __DATA__ token at the end of my scripts for this. The __DATA__ token strength is to make possible to « "embed” a file inside a Perl program then read it from the DATA filehandle ». It saves you the creation and opening of a real file and is very handy for quick prototypes and tests.

#!/usr/bin/env perl
use strict;
use warnings;

# Your script here

# Everything under is considered as
# the end of the code
__DATA__
a
lot
lot
of
stuff
here
...

A common practice is to load those data to an array by treating them as a file handle:

my @lines = <DATA>;

But the values would include carriage returns, what you obviously don't want. I used two solutions for this:

my @lines;
push @lines,
  split while <DATA>;

This is quite readable and self-explanatory (remember Perl a natural language, it was created by a linguist). Feel free to comment if something is unclear so I could improve the post.

Ok, I have to admit a little secret:

push my @lines,
  uniq split while <DATA>;

... without the pre-declaration of @lines does the same. I had to counter check it worked, but as often with Perl, when you spontaneously think of something silly, it actually works naturally (I have to admit it sometimes looks like a miracle).

If you want uniq values (you surely do), one way is to use the core module List ::Util:

use List::Util qw(uniq);

push my @lines,
  uniq split while <DATA>;

Another way to do it is always possible:

chomp( my @lines = uniq <DATA> );

I actually prefer this list context solution, for it's shortness, dunno which one is the more readable, and it is good to choose the readable way.

Let's say you want to generate a series of or for your colleagues or customers. We are actually doing a super advanced language generation thing here:

#!/usr/bin/env perl
use strict;
use warnings;
use List::Util qw(uniq);

chomp( my @lines = uniq <DATA> );

for ( @lines ) {
  # $_ is the current loop element
  print generate_string( $_ );
  # $lines[-1] is the last array element
  if ( not $_ eq $lines[-1] ) {
    print ' or ';
  } else {
    print "\n";
  }
}

sub generate_string {
  return 'line == "' . shift . '"';
}

__DATA__
a
lot
lot
of
stuff
here
...


$ perl lines.pl
line == "a' or line == "lot' or line == "of' or line == "stuff' or line == "here' or line == "...'

Lots of other solutions exist, check the Perl one-liners thing that allow to learn a lot more about those kind of practices.

The quantities of cools things you can do inside this loop is infinite, from log parsing to generating code or data munging, thanks to the kindness of Perl.

References

Note

I wrote this because my memory is awful and I was tired of always searching for the exact syntax of the __data__ token to array process. Hope it will help all kinds of people including me when I type it in a search engine.

while loops that have an index

Perl got this syntax that allow to use a while loop without having to explicitly increment an index by doing an i++. It is made possible by the each function.

Let's demonstrate this in a simple test that check that and array and an array ref contains the same things:

# t/01_foo_order.t    
use v5.18;
use Test::More tests => 3; 

my $events_arr_ref = get_events();
my @expected_events = ('foo', 'bar', 'baz');

while ( my ( $i, $event ) = each( @$events_arr_ref )) {
  is @$events_arr_ref[$i], 
    $expected_events[$i], 
    "Array element [ $i is $expected_events[$i]";
}

done_testing();

sub get_events {
  return [ 'foo', 'bar', 'baz' ];
}

Let's execute our test:

$ prove -v t/01_foo_order.t
1..3
ok 1 - Array element [ 0 ] value is foo
ok 2 - Array element [ 1 ] value is bar
ok 3 - Array element [ 2 ] value is baz
ok
All tests successful.
Files=1, Tests=3,  0 wallclock secs ( 0.03 usr  0.00 sys +  0.07 cusr  0.00 csys =  0.10 CPU)
Result: PASS

while ( my ( $i, $event ) = each( @$events_arr_ref )) {} makes possible to iterate on the $events_arr_ref array reference and for each element found, initializing $i and $event with the right value.

This is quite the same than a for loop except that you don't have to increment the index and that it must be used in case you want to iterate on the whole array.

I use it quite often, can be handsome if you want to avoid $_. Just yet another TIMTOWTDI...

Sources:

A concise mtime sorted directory listing application

Today we will focus on a simple task: listing the files contained in a directory, sort them by modification time (see mtime) and display the result in a JSON array.

We are gonna use Mojo::File for the file system part and Mojolicious::Lite to expose those data on a simple but effective JSON API.

About Sébastien Feugère

user-pic I am a Perl culture enthusiast since 2011. Currently working at a french opinion institute, I try to build tools using the Perl toolkit, but also Debian, LXC and a bit of JavaScript. Other hobbies: Blogger of dreams about art school (FR) Professional can opener of a teenage cat Psychotherapist for sad computers, would wrap them in blankets and make them tea Translator of a book about net.art (FR) Go to my Gitlab account to discover more.