Load a list of lines into an array (easily)

This blog post describes a common task my colleagues ask often about repeating a dynamic string in a defined token and adding some or, and, = in between, plus finishing smartly.

I like to use the Perl's __DATA__ token at the end of my scripts for this. The __DATA__ token strength is to make possible to « "embed” a file inside a Perl program then read it from the DATA filehandle ». It saves you the creation and opening of a real file and is very handy for quick prototypes and tests.

#!/usr/bin/env perl
use strict;
use warnings;

# Your script here

# Everything under is considered as
# the end of the code
__DATA__
a
lot
lot
of
stuff
here
...

A common practice is to load those data to an array by treating them as a file handle:

my @lines = <DATA>;

But the values would include carriage returns, what you obviously don't want. I used two solutions for this:

my @lines;
push @lines,
  split while <DATA>;

This is quite readable and self-explanatory (remember Perl a natural language, it was created by a linguist). Feel free to comment if something is unclear so I could improve the post.

Ok, I have to admit a little secret:

push my @lines,
  uniq split while <DATA>;

... without the pre-declaration of @lines does the same. I had to counter check it worked, but as often with Perl, when you spontaneously think of something silly, it actually works naturally (I have to admit it sometimes looks like a miracle).

If you want uniq values (you surely do), one way is to use the core module List ::Util:

use List::Util qw(uniq);

push my @lines,
  uniq split while <DATA>;

Another way to do it is always possible:

chomp( my @lines = uniq <DATA> );

I actually prefer this list context solution, for it's shortness, dunno which one is the more readable, and it is good to choose the readable way.

Let's say you want to generate a series of or for your colleagues or customers. We are actually doing a super advanced language generation thing here:

#!/usr/bin/env perl
use strict;
use warnings;
use List::Util qw(uniq);

chomp( my @lines = uniq <DATA> );

for ( @lines ) {
  # $_ is the current loop element
  print generate_string( $_ );
  # $lines[-1] is the last array element
  if ( not $_ eq $lines[-1] ) {
    print ' or ';
  } else {
    print "\n";
  }
}

sub generate_string {
  return 'line == "' . shift . '"';
}

__DATA__
a
lot
lot
of
stuff
here
...


$ perl lines.pl
line == "a' or line == "lot' or line == "of' or line == "stuff' or line == "here' or line == "...'

Lots of other solutions exist, check the Perl one-liners thing that allow to learn a lot more about those kind of practices.

The quantities of cools things you can do inside this loop is infinite, from log parsing to generating code or data munging, thanks to the kindness of Perl.

References

Note

I wrote this because my memory is awful and I was tired of always searching for the exact syntax of the __data__ token to array process. Hope it will help all kinds of people including me when I type it in a search engine.

Leave a comment

About Sébastien Feugère

user-pic I am a Perl culture enthusiast since 2011. Currently working at a french opinion institute, I try to build tools using the Perl toolkit, but also Debian, LXC and a bit of JavaScript. Other hobbies: Blogger of dreams about art school (FR) Professional can opener of a teenage cat Psychotherapist for sad computers, would wrap them in blankets and make them tea Translator of a book about net.art (FR) Go to my Gitlab account to discover more.